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(ii) 



MOLECULE TYPE: cDNA 



PCT/US97/18032 



{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
CTCTCCCCCC CCCCTCTCTC TCTCTCTCGC ATACTAACTA GGTTTGACTG TATTACTCGT 60 

ACCAGATTTA AAATTAGACT AGCCTTGCCA CAACGCCCTA CTGAGAGGTA CTGTCGAACT 120 

GTAGACAGCA TGATGTTCTT TGATGGTGAA AGTCTAAATC TGGACCGTGT TCAGAGATAC 180 

CAAATGATGA GGCTGAAAAG GGGAAAGGGG GTTCTTCAGT CTCTTCTTCT tCTTCTTTTT 240 

p^r^r^rj.rj.rj.r^rj^ CCATGATGTT TTCTCTATGG CCAGTGCAAA TGGTGTTGTC ACCCTTGCAT 300 

3 09 

GTTGCCAAC 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 60 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Met Phe Phe Asp Gly Glu Ser Leu Asn Leu Asp Arg Val Gin Arg 
1 5 10 15 

Tyr Gin Met Met Arg Leu Lys Arg Gly Lys Gly Val Leu Gin Ser Leu 
20 25 30 

Leu Leu Leu Leu Phe He Phe Phe Ser Met Met Phe Ser Leu Trp Pro 
35 40 45 

Val Gin Met Val Leu Ser Pro Leu His Val Ala Asn 
50 55 60 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 257 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

AGGTCTCTCT GGTTCTTTCT ATATCATCAT TTTATTATTA TGTCCTAATA TAAAGTACTG 60 

GCTCATAGGG CCAGGGTATT ATTATAGAAT ATTATTNTCG CATGTAAACA AAGATATCTT 120 

TGCTTTAAGA TGTGAGAAGA AATGAATTTA CTTTGTTTGC ATTAAGTTAN GGAAGAGTTG 180 

TAATATATAC TTTAAGAAAG AAGAGAAGAA AACTAGTATC TNTAAGCGGT AAAAAAAAAA 240 



AAAAAAAAAA AAAAAAA 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 67 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



257 



(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

CACGAGGATT GATTTCCATC TTGCCTCTCC ANAAGGCAAA ACCTTAGTTT TTGAACAAAG 60 

AAAATCAGAT GGAGTTCACA CTGNTANANA CTGAANTTGG TGATTACATG TTCTGCTTTG 120 

ACAATACATT CAGCACCATT TCTGAGAANG TGATTTTCTT TGAATTAATC CTGGATAATA 180 

TGGGAGAACA GGCACAAGAA CAAGAAGATT GGAAGAAATA TATTACTGGC ACAGATATAT 24 0 

TGGATNTNAN NCTGGAAGAC ATCCTGGAAT CCATCAACAG CATCAAGTCC AGACTAAGCA 300 

AAAGTGGGCA CATACAAACT CTGCTTAGAG CATTTGAAGC TCGTGATCGA AACATACAAG 360 

AAAGCAACTT TGATAGAGTC AATTTCTGGT CTATGGTTAA TTTAGTGGTC ATGGTGGTGG 420 

TGTCAGCCAT TCAAGTTTAT ATGCTGAAGA GTCTGTTTGA AGATAAG 467 
(2) INFORMATION FOR SEQ ID NO : 13 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 133 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID N0:13: 

Met Glu Phe Thr Leu Xaa Xaa Thr Glu Xaa Gly Asp Tyr Met Phe Cys 
15 10 15 

Phe Asp Asn Thr Phe Ser Thr He Ser Glu Xaa Val He Phe Phe Glu 
20 25 30 

Leu He Leu Asp Asn Met Gly Glu Gin Ala Gin Glu Gin Glu Asp Trp 
35 40 45 

Lys Lys Tyr He Thr Gly Thr Asp He Leu Asp Xaa Xaa Leu Glu Asp 
50 55 60 

He Leu Glu Ser He Asn Ser He Lys Ser Arg Leu Ser Lys Ser Gly 
65 70 75 80 

His He Gin Thr Leu Leu Arg Ala Phe Glu Ala Arg Asp Arg Asn He 
85 90 95 

Gin Glu Ser Asn Phe Asp Arg Val Asn Phe Trp Ser Met Val Asn Leu 
100 105 110 

Val Val Met Val Val Val Ser Ala He Gin Val Tyr Met Leu Lys Ser 
115 120 125 

Leu Phe Glu Asp Lys 
130 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 387 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

TGTTTGAAGA TAAGAGGAAA AGTAGAACTT AAAACTCCAA ACTAGAGNAC GTAACATTGA 60 

AAAATGAGGC ATAAAAATGC AATAAACTGT TACAGTCAAG ACCATTAATG GTNTTNTCCA 120 

AAATATTTTG AGATATAAAA GTAGGAAACA GGTATAATTT TAATGTGAAA ATTAAGTNTT 180 

CACTTTCTGT GCAAGTAATC CTGCTGATCC AGTTGTACTT AAGTGTGTAA CAGGAATATT 24,0 
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TTGCAGAATA TAGGTTTAAC TGAATGAAGC CATATTAATA ACTGCATTTT CCTAACTTTG 300 

AAAAATTTTG CAAATGTCTT AGGTGATTTA AATAAATGAG TATTGGGCCT AATTGCAAAA 3 60 

AAAAAAAAAA AAAAAAAAAA AAAAAAA 387 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 279 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
GAATTCTCTT GAAGNTGGGG GGTGCNGGNN GGGGAAANCG NNTCTCCNNT CCANAAGCGG 60 
GGGCCNTTTT GTCCGTNNNC TTGTGNAAAA AANCCCGGNG NTGGTGAACG CTGNTNTTAN 120 
TTACTCCAAA CCTCGANTGG NCNNTTNGTG GTNCNNCGCC GAGGNTGANN TGGNTCCCCC 180 
CCCCCCTGNT NNAATNCCNA AAACTNTTCN GAACCCGAAA ANAATTNTCC ATTCTGCCNN 240 
NANTGGTTTC NTCCNNCNNC TCCTNATTAA AGAAGCNNT 
(2) INFORMATION FOR SEQ ID NO: 16: 

(ij SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:16: 
GGCGGGTGAC ATTCAGCCGG CGGTTCGGGG GGACGGANTC TCCATTCCAG AACCATGGCC 
CAATTTGTCC GTAACCTTGT GGAGAAGACC CCGGCGCTGG TGAACGCTGC TGTGACTTAC 
TCGAAGCCTC GATTGGCCAC ATTTTGGTAC TACGCCAAGG TTGAGCTGGT TCCTCCCACC 
CCTGCTGAGA TCCCTAGAGC TATTCAGAGC CTGAAAAAAA TAGTCAATAG TGCTCAGACT 
-GGTAGCTTCA AACAGCTCAC AGTTAAGGAA GCTGTGCTGA ATGGTTTGGT GGCCACTGAG 



60 
120 
180 
240 
300 
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GTGTTGATGT GGTTTTATGT CGGAGAGATT ATAGGCA 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 94 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

Met Ala Gin Phe Val Arg Asn Leu Val Glu Lys Thr Pro Ala Leu Val 
15 10 15 

Asn Ala Ala Val Thr Tyr Ser Lys Pro Arg Leu Ala Thr Phe Trp Tyr 
20 25 30 



Tyr Ala Lys Val Glu Leu Val Pro 
35 40 

Ala lie Gin Ser Leu Lys Lys lie 
50 55 

Phe Lys Gin Leu Thr Val Lys Glu 
65 70 

Thr Glu Val Leu Met Trp Phe Tyr 
85 



Pro Thr Pro Ala Glu lie Pro Arg 
45 

Val Asn Ser Ala Gin Thr Gly Ser 
60 

Ala Val Leu Asn Gly Leu Val Ala 
75 80 

Val Gly Glu He He Gly 
90 



(2) INFORMATION FOR SEQ ID NO: 18: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 345 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: CDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

AAATTANAGG AAGANCCTNT TGAAAAAATT TNTGTTTGTN AAAAAGNTAG GGNAATTGTT 60 

ATTTTGGAAA TAGCCTNCCC NAGNGNGGAN AGGGGGGNAT TTTAAGNANG NTTTTTTGNA 120 

AAATTTTTNG NCGNNGGNNA GAANCNAAAA AGNGGAATTT GNNTTTTAAG GGGGNTANTT 180 
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GNTTGTTTGG GTTTAANACC CTTGCCAAAA ^3NAAANACCC CCAAGNNANT TNAANNAGGG 



240 



TATAANTTAG NATTTTTCCC TGGANTTAAA NAGNANATTA TATNCTGGAA NAAANGNAAN 



300 



GGTTGGTATN AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAA 



345 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 456 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ IDN0:19: 

AGAGATTCAG GACCTGCAGA GTCGCCAGAA GCATGAAATT GAATCTTTGT ATACTAAACT 60 

GGGCAAGGTT CCCCCTGCTG TCATTATTCC CCCAGCTGCT CCTCTGTCGG GGAGAAGAAG 120 

GAGACCCACT AAAAGCAAAG GCAGCAAGTC TAGTCGCAGC AGCTCATTGG GCAATAAAAG 180 

CCCACAGCTT TCAGGCAACC TGTCTGGTCA GAGTGGAACT TCAGTCTTAC ACCCCCAACA 240 

GACCCTCCAC CCTCCTGGCA ACATCCCANA NTCCGGGCAG AATCAGCTGT TACAGCCCCT 300 

TAAGCCATCT CCCTCCAGTG ACAACCTCTA TTCAGCCTTC ACCAGTGATG GTGCCATTTC 360 

AGTACCAAGC CTTTCTGCTC CAGGTCAAGG AACCAGCAGC ACAAACACTG TTGGGGCAAC 420 

AGTGAACAGC CAAGCCGCCC AAGCTCAGCC TCCTGC 456 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 130 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

Met Lys Leu Asn Leu Cys lie Leu Asn Trp Ala Arg Phe Pro Leu Leu 
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Ser Leu Phe Pro Gin Leu Leu Leu Cys Arg Gly Glu Glu Gly Asp Pro 
20 25 30 

Leu Lys Ala Lys Ala Ala Ser Leu Val Ala Ala Ala His Trp Ala lie 
35 40 45 

Lys Ala His Ser Phe Gin Ala Thr Cys Leu Val Arg Val Glu Leu Gin 
50 55 60 

Ser Tyr Thr Pro Asn Arg Pro Ser Thr Leu Leu Ala Thr Ser Xaa Xaa 
65 70 75 80 

Pro Gly Arg lie Ser Cys Tyr Ser Pro Leu Ser His Leu Pro Pro Val 
85 90 95 ^1 

Thr Thr Ser lie Gin Pro Ser Pro Val Met Val Pro Phe Gin Tyr Gin 
100 105 110 

Ala Phe Leu Leu Gin Val Lys Glu Pro Ala Ala Gin Thr Leu Leu Gly 
115 120 125 

Gin Gin 
130 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 188 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

TACCCTGCCC TCCTCCCTTT TTTNNACCCC TCTCTTTTTT ATTTTTTCTT TGCTCTTTAG 60 

AACCCAGTGA AAAATACCAG GGTACTGGGG TGCAACTCTT TCTTATGATA GGTCATTAGT 120 

GCTTTAAGCA AAAGATATTA GCAGCTTTGA CTGCAGCATT AGCAATTAGG NAAAAAAAAA 180 



AAAAAAAA 

(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 752 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



188 
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(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22:'^ 

CCTTATGGCC TACTTTAAAA AAAAACCAAT ACCAAAGAAG CCTACAATGT TGGCCTTAGC 60 

CAAAATTCTG TTGATTTCAA CGTTGTTTTA TTCACTTCTA TCGGGGAGCC ATGGAAAAGA 120 

AAATCAAGAC ATAAACACAA CACAGAACAT NGCAGAAGTT TTTAAAACAA TGGAAAATAA 180 

ACCTATTTCT TTGGAAAGTG AAGCAAACTT AAACTCAGAT AAAGAAAATA TAACCACCTC 240 

AAATCTCAAG GCGAGTCATT CCCCTCCTTT GAATCTACCC AACAACAGCC ACGGAATAAC 300 

AGATTTCTCC AGTAACTCAT CAGCAGAGCA TTCTTTGGGC AGTCTAAAAC CCACATCTAC 360 

CATTTCCACA AGCCCTCCCT TGATCCATAG CTTTGTTTCT AAAGTGCCTT GGAATGCACC 420 

TATAGCAGAT GAAGATCTTT TGCCCATCTC AGCACATCCC AATGSTACAC CTGCTCTGTY 480 

TTCARAAAAC TTCACTTGGT CTTTGTCAAT GACACCGTGA AAACTCCTGA TAACAGTTCC 540 

ATTACAGTTA GCATCCTCTY TTCARAACCA ACTTCTCCAT CTGTGACCCC CTTGATAGTG 600 

GAACCAAGTG GATGGNTTAC CACAAACAGT GATAGNTTCA CTGGGTTTAC CCCTTATCAA 660 

GNAAAAACAA CTTTACAGCC TACCTTAAAA TTCACCAATA ATTCAAAACT NTTTCCAAAT 720 

ANGTCAGATC CCCCAAAAAA AAAAAAAAAA AA 7 52 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 157 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 

Met Leu Ala Leu Ala Lys lie Leu Leu lie Ser Thr Leu Phe Tyr Ser 
15 10 15 

Leu Leu Ser Gly Ser His Gly Lys Glu Asn Gin Asp He Asn Thr Thr 
20 25 30 

Gin Asn Xaa Ala Glu Val Phe Lys Thr Met Glu Asn Lys Pro He Ser 
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Leu Glu Ser Glu Ala Asn Leu Asn Ser Asp Lys Glu Asn lie Thr Thr 
50 55 60 

Ser Asn Leu Lys Ala Ser His Ser Pro Pro Leu Asn Leu Pro Asn Asn 
65 70 75 80 

Ser His Gly He Thr Asp Phe Ser Ser Asn Ser Ser Ala Glu His Ser 
85 90 95 

Leu Gly Ser Leu Lys Pro Thr Ser Thr He Ser Thr Ser Pro Pro Leu 
100 105 110 

He His Ser Phe Val Ser Lys Val Pro Trp Asn Ala Pro He Ala Asp 
115 120 125 

Glu Asp Leu Leu Pro He Ser Ala His Pro Asn Xaa Thr Pro Ala Leu 
130 135 140 

Xaa Ser Xaa Asn Phe Thr Trp Ser Leu Ser Met Thr Pro 
145 150 155 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 417 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
AAGCTTGGCA CGAGGTCTTT AGAAGAACTA CAAAACCTGA ATGGAAAACT TCGAAGTGAA 
GGACAAGGNA ATATGGGCTT TACTAGGCAG AATCACAGGG CAGAAGTTGA ATATACCGGC 
AATTTTGAGA GCACCCAAGG AGAGAAAACC AAGTAAAAAA AGAAGGAGGC ACACAAAAGA 
CATCTACTCT TCCTGCAGTA CTTTATAGTT GTGGGATTTG TAAGAAGAAC CATGATCAGC 
ATCTTCTTTT ATTGTGTGAT ACCTGTAAAC TACATTACCA TTTTGGATGT CTGGATCCTC 
CTCTAACAAG GATGCCAAGA AAGACCCAAA ACAGTTATTG GCAGTGCTCG GAATGTGACC 
AGGCAGGGAG CAGTGACATG GAAGCAGATA TGGCCATGGA AACCCTACCA GATGGAA 
(2) INFORMATION FOR SEQ ID NO: 25: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 35 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

Met Pro Arg Lys Thr Gin Asn Ser Tyr Trp Gin Cys Ser Glu Cys Asp 
15 10 15 

Gin Ala Gly Ser Ser Asp Met Glu Ala Asp Met Ala Met Glu Thr Leu 
20 25 30 

Pro Asp Gly 
35 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 359 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

TCTGTGTTCA GTATAATTTT ATTTTTCTCA ACCTTAAATA TGAACTTAGG AAATAAGGAG 60 

GGAAGTACAA AGATTATTGA CTATACAACN TACCAGCTGA AAGAAAGATC TTCATCAACA 120 

TCTGTATCTT TCCAGAGGTA TACAGAATTA AAATTNNATN TTCAAGCTTT AATGATCCAG 180 

TTTTAAGTCA ACGGCAGAAG TATGTTGAAT ATTTCATCAC TCAATCTTGA ACTGATTTAG 240 

AAGAGACTCT TTGCTGAAAT TGAATTGCAC TTATACATGT AAATTGTCAA CATGTAATTT 300 

GGAATTTTCT GATTAATAAA TGTGGTTTTG GACATCTAAA AAAAAAAAAA AAAAAAAAA 3 59 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 675 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS; double 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

CTCNCAAATC GGCNCGNGCA ACGAACGGCT TGGGCGCGGA CTGGTATCCG GGGACTGTGA 60 

CTTGCAGGGT CCGCCATGGA GCCAGAGCAG ATGCTGGAGG GACAAACGCA GGTTGCAGAA 120 

AATCCTCACT CTGAGTACGG TCTCACAGAC AACGTTGAGA GAATAGTAGA AAATGAGAAG 180 

ATTAATGCAG AAAAGTCATC AAAGCAGAAG GTAGATCTCC AGTCTTTGCC AACTCGTGCC 240 

TACCTGGATC AGACACTTGT GCCTATCTTA TTACAGGGAC TTGCTGTGCT TGCCAAGGAA 300 

AGACCACCAC ATCCCATTGA ATTTCTAGCA TCTTATCTTT TAAAAAACAA GGCACAGTTT 360 

GAAGATYGAA ACTGAMTTAA TGGGRAGAAC AGAAAAATTT AGTTGSTACT GTAGATTTAC 420 

ATGATTAAGA RGCAGCTTTA ATTGCCATGA TCATTCCCTT TTTTTGGAAG GATAAGNACC 480 
TTNCGGANAA CAGNACCTAT TTTTGGGATT GCAGNAGNTA AAATATTTCC CNTATTTTGA 
NTTAATNACC ATAAACCNTA CCTATTTAAT GNGNGTATTT TGTGCAATTT TTTTTTNAGN 

TTGTTTTTAA ATTTGTTTTT AAAATGACCT TNAAAATNAA NTGTNNAAAC ACCiaTTTAAA 660 

675 

AAAAAAAAAA AAAAA 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 99 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



540 
600 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

Met Glu Pro Glu Gin Met Leu Glu Gly Gin Thr Gin Val Ala Glu Asn 
15 10 15 

Pro His Ser Glu Tyr Gly Leu Thr Asp Asn Val Glu Arg He Val Glu 
20 25 30 

Asn Glu Lys He Asn Ala Glu Lys Ser Ser Lys Gin Lys Val Asp Leu 
35 40 45 
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Gin Ser Leu Pro Thr Arg Ala Tyr Leu Asp Gin Thr Leu Val Pro He 
50 55 60 

Leu Leu Gin Gly Leu Ala Val Leu Ala Lys Glu Arg Pro Pro His Pro 
65 70 75 30 

He Glu Phe Leu Ala Ser Tyr Leu Leu Lys Asn Lys Ala Gin Phe Glu 
85 90 95 

Asp Xaa Asn 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 552 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 

CACGAGGGTT TGGTGAGGAA ATTACCAGAG AACTATTAAA GACTTGGATG CTCTTCTCGG 60 

CTTTGCTATT AAGTAAGTTG GACAAGTTGT TTGGCTTCTT TGAGCCTCTG TTTTCTCCAT 120 

TCTAAAATTC TAAAATGGGA GTGTTGAATT AGATCAGTGG CTTTCGAACT TTCTGCTCCT 18 0 

AGTAGTGAGA AATACATTTT ACTCCACTCC CTGGTATGTA CACGCATTCC TGTGTTTTGT 24 0 

GAAAACCTGA CACCATGCTC CTCCCTCACT ACATGTAAAA CACTTTTATT CATTAAAAAG 300 

AAAACTGACT GGCTTGGACC TACAAATTAG TTTCATTATT TGTTAATGTT TGAAAGCCAT 360 

TAAAAGATGA ATATTAAGGT TTCTTTATAC TCAATACTTG TAGTTTTGTT TGGGGGAATG 420 

AGAGGATGCC CTTGGTACCT TTGTGAGGCC TCTCCACTGA GGGTCAATCA TGACTTCTGT 480 

TTTAAACCAG CCCATCCCAT CTTCTCCAGC TGCTCTCCTT ATGTCTTGCT TCTCTCCCCT 540 

CCAACCTTCT CA 552 
(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 62 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:30: 

Met Asn lie Lys Val Ser Leu Tyr Ser He Leu Val Val Leu Phe Gly 
1 5 10 15 

Gly Met Arg Gly Cys Pro Trp Tyr Leu Cys Glu Ala Ser Pro Leu Arg 
20 25 30 

val Asn His Asp Phe Cys Phe Lys Pro Ala His Pro lie Phe Ser Ser 



35 



40 45 



Cys Ser Pro Tyr Val Leu Leu Leu Ser Pro Pro Thr Phe Ser 
50 55 60 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 318 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
CAGGGCCCCA TCCTTCAGTG CATTGCACAC TTTGCATGNT GGGTCAGGGA AGATTGTGGA 
GAGAGGACAG TGCACATGGT TTCCCCCACN TNGNCTGCGT GGGGGTATGT CCTGCTTCCG 
CCACTTCCAA CTGTGGCANT TGGGCACGCC CCTNTCAGGG CACCTTCCCT TTTTGTTTCC 
GCAAAATGAG GTTGTAATAG TGCCTGCCGC ACTGTNTGGC ACACAGTAAG NTCTCAAGAA 
ATGTTAGCTG TTGTTGCCGT TAGAACACCA TAGNTAGAAT ACCATACNTG GCATTCACTT 
AAAAAAAAAA AAAAAAAA 
(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 310 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



60 
120 
180 
240 
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318 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 

ATTGAGGAAA ACCACAAAAA ACTTCAAAAC AGCTACAACG GGAAAAAGAG AGTTTTGTCC 60 ■ 

CACAGTCAGC AGGCCACTAG TTTATTAACT TCCAGTCACC TTGATTTTTG CTAAAATGAA 120 

GACTCTGCAG TCTACACTTC TCCTGTTACT GCTTGTGCCT CTGATJ\AAGC CAGCACCACC 180 

AACCCAGCAG GACTCACGCA TTATCTATGA TTATGGAACA GATAATTTTG AAGAATCCAT 24 0 

ATTTAGCCAA GATTATGAGG ATAAATACCT GGATGGAAAA AATATTAAGG AAAAAGAAAC 3 00 

TGTGATAATA 310 
(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 65 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear ■ 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 

Met Lys Thr Leu Gin Ser Thr Leu Leu Leu Leu Leu Leu Val Pro Leu 
15 10 15 

lie Lys Pro Ala Pro Pro Thr Gin Gin Asp Ser Arg lie lie Tyr Asp 
20 25 30 

Tyr Gly Thr Asp Asn Phe Glu Glu Ser lie Phe Ser Gin. Asp Tyr Glu 
35 40 45 

Asp Lys Tyr Leu Asp Gly Lys Asn He Lys Glu Lys Glu Thr Val He 
50 55 60 

He 
65 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 303 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
CCCAAGNAAN TTTCAANTTT TTGCCTTTNC TGGCCTTTAN TGGATCCCNA AAGCATTTAA 60 
GGNANATGTT CCNAAAANTT TGNAAAGNTA AANGTTTCCC ATGATCGCTC ATTTTTTTTT 120 
TATGATTCAN ANGTTATTCC TTATAAAGTA AGNANTTTGT TTTCCTCCTA TCAAGGCAGN 180 
TATTTTATTA AATTTTTCAN TTAGTTTGAG NAATAGCAGA TAGTTTCATA TTTAGGGAAA 240 
NTTTCCAAAT AAAATAAATG TTATTNTTTG ATAAAGAGNT AAAAAAAAAA AAAAAAAAAA 300 
AAA 

(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 418 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



303 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

AAGCTTGNGC ACGNGGCACA AGTAGCTACG NCTGCAAGCA CCTGCCACCA TAAAGGGGNT 60 

GCATTTTGCC ACCATAAANG GGNTGCATTT TTTTAAAAAG CCTAGGCNGC TCTAACATCA 120 

TCTGATATGG ACACAANGCN AACAGTTTCC NTATNTACAT CCNTACCTCT AAAAGATACT 180 

TCAAAGTGAC AAAAACGTGT TCCTTCCCCA CTTAGAGACA ATGATTAACA GGGCCCTATA 240 

TGTTCTTACC ACATACAGAG GATGCATTTA TTTTTGCTCT ATGACACTTG CAAAAATCTC 300 

TACTGTAATT AATTTGGGTC TATTATTAAC TCTCTGTTCC ATCATAGAAT GTGGCCAGGC 360 

CTTACAATGG AGAGCCAGAG TTAAAACTTC AAGTTGCATC TGTTTTTGGG CTGAGTCA 418 
(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 amino acids 

(B) TYPE: amino acid 

(C ) STRANDEDNESS : 

(D) TOPOLOGY: linear 
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MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION; SEQ ID NO : 3 6 : 

Met Thr Leu Ala Lys lie Ser Thr Val lie Asn Leu Gly Leu Leu Leu 
15 10 15 

Thr Leu Cys Ser He He Glu Cys Gly Gin Ala Leu Gin Trp Arg Ala 
20 25 30 

Arg Val Lys Thr Ser Ser Cys He Cys Phe Trp Ala Glu Ser 
35 40 45 



(2) INFORMATION FOR SEQ ID NO: 37: 



TACTGATTAA ATTAAAAAAA AAAAAAAAAA A 
(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 583 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

{ii) MOLECULE TYPE: cDNA 



60 
120 
180 
240 
300 
331 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 331 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
AGTTTGTTCT GTAAATATTT NGAAAAGTGA CAGCTNTCAA CTTCAGGGTA ACTATTTCTA 
AAAATGTAAA TANGTATTAA TCCTTGTATC TTTTATGGTA ATTTNGCATA TTGATATGAA 
TTANATAAAA TTGTTTAAAA TAAAAGGTGT CCTTGAATTA CTGACCACCC ATAGATGTNT 
ACTGTTACCA GGTTTTACAA TGCAAATTTT CACTAATACC TGGGTTTAAT ACAGCTCACA 
TCACTGAATG TTACACATGA GTTTAAATGG GTTAATATAC AGGTTTTGTT ATAATAAAGT 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 

CACGNGGGTG AGGCCGACTG CTGAAGACAG CTCGCCACCC TCCTTGCCTC CACTCCAATC 60 

CAGGGGCTGG GGCCACATTC TTTGCCTTCA TTTATCCTCA GATCAGGTGA GATCGACAGG 120 

AGGTGTTGAT GGCAGTGCCA GCAATTATTG CTAATCCGTT TGCATCCTTA TGCATAGATC 180 

TGAATTCAGA CTTTGTGAAT TTCCAGAGGT GTGGGTNATA TAATAGAATT CAGTGAGTGG 240 

GCATGGCTGA TCTTGTGCAA ATTAAAAGTT ATGGGGCATA AGAATAGCAA AAGTTGAACT 300 

TCTTTTAAAA AGGAAAGTAC CCTGAGAGCC AGTATTGGTT GAGGCTCTTC AGTATGCCCA 360 

GGTTGGCAGC ACTGAGAACC GCAGGAACGG CCTGTTGTTA CAAAAAGGAG ATTGACTCAG 420 

CTGCCCTTGG TGCATCTGAC TGACTATGAC TGCTGAGAGA TTCCAAGGAC CCTTAATGCC 480 

AGGGCTAACC TCTCCATGTG CAGTGAGACC TCTGGAGGAA GTGTCATCCT CTGGCTTTGT 540 

GTGGTACTCA TTATGGTGCA GTGCGGGCAT GAAATGAAGA CAC 583 
(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

Met Cys Ser Glu Thr Ser Gly Gly Ser Val He Leu Trp Leu Cys Val 

1 5 10 • 15 

Val Leu He Met Val Gin Cys Gly His Glu Met Lys Thr 
20 25 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 311 base pairs 
{B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
CCCAAATAGG CTTACAGATA CGATATGTTT TAAATGTTTN GTATTTAACA AAAACATACT 60 
GACACTGTTT GGAAATGGCA ACAGGAAGAT AGCAAAATGA ATACTAACAT TACGAAAAGA 120 
TGAACAGGTA CATGTTCCAA GGCAGGTGGC TGTGAACTTC CTCTGAGTGA AGGCATCCCC 180 
TCCAGCACCT TTCAGCCTGC TAGTTAGGAC GACCCGCCGC CACCCTCCAG GACNTCCAGC 240 
CCTGCANTGC NTTTCTTTTN TTTTAAATAA TTCTTCATTG AGTTCTAATA TGTAAAAAAA 300 
AAAAAAAAAA A 

(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 405 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



311 



(xi). SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
AAGCTTGGCA CGAGGGCGGT TGAGGCCTTC GGTGGTGAAC GAGTCTCCAG CACCATGTCT 
GGTTTGTCTG GCCCACCAGC CCGGCGCGGC CCTTTTCCGT TAGCGTTGCT GCTTTTGTTC 
CTGCTCGGCC CCAGATTGGT CCTTGCCATC TCCTTCCATC TGCCCATTAA CTCTCGCAAG 
TGCCTCCGTG AGGAGATTCA CAAGGACCTG CTAGTGACTG GCGCGTACGA GATCTCCGAC 
CAGTCTGGGG GCGCTGGCGG CCTGCGCAGC CACCTCRAGA TCACAGATTC TGCTGGCCAT 
ATTCTCTACT CCAAAGAGGA TGCAACCAAG GGGAAATTTG CCTTTACCAC TGAAGATTAT 
GACATGTTTG AAGTGTGTTT TGAGAGCAAG GGAACAGGGC GGATA 
(2) INFORMATION FOR SEQ ID NO:42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 117 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

Met Ser Gly Leu Ser Gly Pro Pro Ala Arg Arg Gly Pro Phe Pro Leu 
15 10 15 

Ala Leu Leu Leu Leu Phe Leu Leu Gly Pro Arg Leu Val Leu Ala lie 
20 25 30 

Ser Phe His Leu Pro lie Asn Ser Arg Lys Cys Leu Arg Glu Glu lie 
35 40 45 

His Lys Asp Leu Leu Val Thr Gly Ala Tyr Glu lie Ser Asp Gin Ser 
50 55 60 

Gly Gly Ala Gly Gly Leu Arg Ser His Leu Xaa lie Thr Asp Ser Ala 
65 70 75 80 

Gly His lie Leu Tyr Ser Lys Glu Asp Ala Thr Lys Gly Lys Phe Ala 
85 90 95 

Phe Thr Thr Glu Asp Tyr Asp Met Phe Glu Val Cys Phe Glu Ser Lys 
100 105 110 

Gly Thr Gly Arg lie 
115 

(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 225 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 

TCTTTCAATT TACCTTGTGA AAACACCCTT AACTTTTTCT TNACCCTTAG CTGAAATGTT 60 

NACATAGCTT NTGGTGATAT CTTTTCATGA TTTTATATNT CTTAAAATGG TGATGGATGT 120 

GACACCTCAT AAAAGTGAGC TTTGAACTGT AGATAACTCT TAAAGAAAAT GTCATTTTAG 180 

ACAATTAAAA TATTTGTGCT CAAAAAAAAA AAAAAAAAAA AAAAA 225 
(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 525 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: 

CGAGGGCAGG TCAGTCAGGT TCCTGGGCGC TCTGTTACAC AAGCAAGATA CAGCCAGCCC 60 

CACCTAATTT TGTTTCCCTG GCACCCTCCT GCTCAGTGCG ACATTGTCAC ACTTAACCCA 120 

TCTGTTTTCT CTAATGCACG ACAGATTCCT TTCAGACAGG ACAACTGTGA TATTTCAGTT 180 

CCTGATTGTA AATACCTCCT AAGCCTGAAG CTTCTGTTAC TAGCCATTGT GAGCTTCAGT 240 

TTCTTCATCT GCAAAATGGG CATAATACAA TCTATTCTTG CCACATCAAG GGATTGTTAT 3 00 

TCCTTTAAAA AAAAACCAAT ACCAAAGAAG CCTACAATGT TGGCCTTAGC CAAAATTCTG 3 60 

TTGATTTCAA CGTTGTTTTA TTCACTTCTA TCGGGGAGCC ATGGAAAAGA AAATCAAGAC 420 

ATACACACAA CACAGAACAT TGCAGAAGTT TTTAANACAA TGGAAAATAA ACCTATTTCT 480 

TTGGAAAGTG AAGCAAACTT AAACTCAGAT AAAGNAAATA TAACC 525 
(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 63 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 

Met Leu Ala Leu Ala Lys lie Leu Leu lie Ser Thr Leu Phe Tyr Ser 
15 10 15 

Leu Leu Ser Gly Ser His Gly Lys Glu Asn Gin Asp lie His Thr Thr 
20 25 30 

Gin Asn lie Ala Glu Val Phe Xaa Thr Met Glu Asn Lys Pro lie Ser 
35 40 45 

Leu Glu Ser Glu Ala Asn Leu Asn Ser Asp Lys Xaa Asn lie Thr 
50 55 60 

(2) INFORMATION FOR SEQ ID NO: 46: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 02 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : doubl e 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 

TCAAAAGGTN ACACAAAATT ACTGTCACGT GGATTTTGTC AAGGAGAATC ATAAAAGCAG 60 

GAGACCAGTA GCAGAAATGT AGACAGGATG TATCATCCAA AGGTTTTCTT TCTTACAATT 120 

TTTGGCCATC CTGAGGCATT TACTAAGTAG CCTTAATTTG TATTTTAGTA GTATTTTCTT 180 

AGTAGAAAAT ATTTGTGGAA TCAGATAAAA CTAAAAGATT TCACCATTAC AGCCCTGCCT 240 

CATAACTAAA TAATAAAAAT TATTCCACCA AAAAATTNTA AAACAAAGNA AAAAAAAAAA 3 00 

302 

AA 

(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 628 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 
CACGAGGTTT CAGACCAGCT TGTGTCAATA GGGTCCTACA GAGCAGCTGA TATCAGCAGT 
TTTACTAGTA TGCAGGACCT GAAAGAATAT CTCAAAGGGA AAACAATGTT TCATAATGTT 
CAGGAAGTTA TCTATAGAGC AGCTAAGGAG CTATAATCTT GTAACAGAGT CTACGTGATT 
GTAGGACAAT AGGCACCACA CAAATATGAG GAAGCAGGTC AGAGAGCGGG CTGACTTAAT 
GATTAATGCT GAATGTGCTA CAAGCTTGTT TCATTTTCAT TTCTCCTCCT CCCTTTTTTC 
CTGATTAATT TAATAAAGTT CATAGGGGAG GCTTCAAACA CATGAGAAAT TAAAACCTTT 
ATTACCAGAG TCAGAGCCTG ACTATATTGA TTGAGTGAAG CTTTCCTTTA TAAAATGCAA 
AGCATGTAAA CAATTCCAAC ACAGTAACAT ATTCATGAGT TTTTAAATTC ATGAGTTTTA 

90 
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GAGAAAATAT TTTACTTAAA ACCAGCACTT GATGATCTCT GACAATGTTA TGTAGCCTGA 
ACCTGGAGTT TTGGCTGATG GGTTGTCTCA GCCTGTGACA GGTTTTAGCT GGCTTTGGTT 
CATCTTGTAT CACACCCCCA CACTCACA 
(2) INFORMATION FOR SEQ ID NO: 48: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ i: 

Pro Glu Pro Gly Val Leu Ala Asp 
1 5 

Phe Ser Trp Leu Trp Phe He Leu 
20 

(2) INFORMATION FOR SEQ ID NO:49: 



) NO: 48: 

Gly Leu Ser Gin Pro Val Thr Gly 
10 15 

Tyr His Thr Pro Thr Leu Thr 
25 30 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 436 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
AGCAGCTAAG GGGAAATAAT CTTGTAACAG GGTCTGGGTG ATTNTGAGGT AATAGGCCCC 
AAACAACCAT GGGGAAGCAG GTCAGAGGGC AAGCTGGCNT AGTGTTTAAC ATTGAATGGG 
CTGAAAGTTT GGTTNATTTT TGTTTCTTGT TTCTCCCCCT CCCTTCTNAC CTGAATAATT 
TTATGAAGTT TATAGGGATG GTTTCAGGAC CTCCATTCTA TCTGTTCCTG AAATATTACA 
AAAAGATTAT TATTGTAGCA CTNATNTAAT TGGGGTTTTA TTTCGTTGTT NGCATGTCTG 
TTTCTTCCCC AGTGAGTTGT AAATTGCTTA AGGGCAAACA GACGCATCCT ATTTATCTGT 
CTGTCACTAA CATTAAGCAC AGCATTTGGT ATACAGTCAT CACTCTAATA AAGTTTGAAA 
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436 



(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 636 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
CACGAGGGAA AAAAAGAGTT TTTTTTTTAG ATCATCAGCT ATTGTTAGTG TTTGTGTATG 
TTATGTGTGG CTCAAGACAA CTTTGCTTCT TTTAATATAG GCAGGGAAGT CAAAAGATTG 
GATATCCCTG CTTTATACCA AGAAAGACAA CACCCCACAT TTGCAGTGCC TGAAAACACT 
ACCAGCCATC TGAAAAACAT GTGACTTCTA ACTTCTGTTC TTTTTTGTAG CAGTGGAATC 
CCACGGTGAT ATCTGAGGGA TGTGGTTACC TTTTGGAGGA GGTTGACGGT TTCTAAGGAT 
GATTCTTTCT GAGTGAAATA TTGTCAGTGT CATTGACCTT TTCATTATTT CAACTATTAT 
TATTCCAGGT TATCAATACT CTGGCTGACC ATCATCATCG TGAGACTGAC TTTGGTGTAG 
GAGTTCGAGA CCACCCTGGC CAACATGGCA AAACCCCATC TCCACAAAAA TTGGATAATT 
TGATAATTAT CATTATTGGG TTTCTGAGAC GTTACACATT TAACATTNTN TTCTGCACAA 
GTTGCCTTTG TGTGAGTATA CTAACTTTCT GTAGAGGTAN ACTTGTAATC ACAAATAAGA 
ATAAATTATA TAAAACAAAA AAAAAAAAAA AAAAAA 
(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 105 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
Phe Phe Leu Ser Glu He Leu Ser Val Ser Leu Thr Phe Ser Leu Phe 
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Gin Leu Leu Leu Phe Gin Val lie Asn Thr Leu Ala Asp His His His 
20 25 30 

Arg Glu Thr Asp Phe Gly Val Gly Val Arg Asp His Pro Gly Gin His 
35 40 45 

Gly Lys Thr Pro Ser Pro Gin Lys Leu Asp Asn Leu He He He He 
50 55 60 



He Gly Phe Leu Arg Arg Tyr Thr Phe Asn He Xaa Phe Cys Thr Ser 
65 70 75 80 

Cys Leu Cys Val Ser He Leu Thr Phe Cys Arg Gly Xaa Leu Val He 
85 90 95 

Thr Asn Lys Asn Lys Leu Tyr Lys Thr 
100 105 

(2) INFORMATION FOR SEQ ID NO: 52: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 53 6 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : double 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 

GGCACGAGGA GCGGGAGCTG GTGCCTTCCC GGAAGGGCTC AGAGGCGGGC TCGGGCAAGC 60 

ACTTTAACCT TTTAAGCCCA ACCAGATGAG TTGCCTGCAG TTTTGGAGGC CTTCAGAGCA 120 

TTTCACTAGA CCTCTGTCTG TGTCGGTCCA ATGTCTTTAG CCAAGCTTTG ATTAAAGATG 180 

ACTTCCTTGT TTGCTCAAGA AATTCGCCTT TCTAAAAGAC ATGAAGAAAT AGTATCACAA 240 

AGATTAATGT TACTTCAACA AATGGAGAAT AAATTGGGTG ATCAACACAC AGAAAAGGCA 300 

TCTCAACTCC AAACTGTTGA GACTGCTTTT AAAAGGAACC TTAGTCTTTT AAAGGATATA 3 60 

GAAGCAGCAG AAAAGTCACT ACAGACCAGG ATTCACCCAC TTCCACGGCC TGAGGTGGTT .420 

TCTCTTGAGA CTCGTTACTG GGCATCAGTA GAAGAATATA TTCCCAAATG GGAACAGTTT 480 

CTTTTAGGAA GAGCACCATA TCCTTTTGCT GTTGAAAATC AAAATGAAGC AGAAAA 536 
(2) INFORMATION FOR SEQ ID NO: 53: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 119 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNES S : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:53: 

Met Thr Ser Leu Phe Ala Gin Glu He Arg Leu Ser Lys Arg His Glu 
15 10 15 

Glu He Val Ser Gin Arg Leu Met Leu Leu Gin Gin Met Glu Asn Lys 
20 25 30 

Leu Gly Asp Gin His Thr Glu Lys Ala Ser Gin Leu Gin Thr Val Glu 
35 40 45 

Thr Ala Phe Lys Arg Asn Leu Ser Leu Leu Lys Asp He Glu Ala Ala 
50 55 60 

Glu Lys Ser Leu Gin Thr Arg He His Pro Leu Pro Arg Pro Glu Val 
65 70 75 80 

Val Ser Leu Glu Thr Arg Tyr Trp Ala Ser Val Glu Glu Tyr He Pro 
85 90 95 

Lys Trp Glu Gin Phe Leu Leu Gly Arg Ala Pro Tyr Pro Phe Ala Val 
100 105 110 

Glu Asn Gin Asn Glu Ala Glu 
115 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 79 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
{D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 54: 
TTTATTTAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 60 
AAAAAAAAAA AAAAAAAAA 
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PAGE INTENTIONALLY LEFT BLANK 

PAGINA DEJADA EN BLANCO INTENCIONALMENTE 

PAGE LAISSEE INTENTIONNELLEMENT EN BLANC 
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What is claimed is: 

1 . A composition comprising an isolated protein encoded by a polynucleotide 
selected from the group consisting of: 

(a) a polynucleotide comprising the nucleotide sequence of SEQ ID 

N0:1; 

(b) a polynucleotide comprising the nucleotide sequence of SEQ ID NO: 1 
from nucleotide 28 to nucleotide 276; 

(c) a polynucleotide comprising the nucleotide sequence of the full length 
protein coding sequence of clone AE402_li deposited under accession number ATCC 
98190; 

(d) a polynucleotide encoding the full length protein encoded by the 
cDNA insert of clone AE402_li deposited under accession number ATCC 98190; 

(e) a polynucleotide comprising the nucleotide sequence of the mature 
protein coding sequence of clone AE402_1 i deposited under accession number ATCC 
98190; 

(f) a polynucleotide encoding the mature protein encoded by the cDNA 
insert of clone AE402_li deposited under accession number ATCC 98190; 

(g) a polynucleotide encoding a protein comprising the amino acid 
sequence of SEQ ID NO:2; 

(h) a polynucleotide encoding a protein comprising a fragment of the 
amino acid sequence of SEQ ID N0:2 having biological activity; 

(i) a polynucleotide which is an allelic variant of a polynucleotide of (a)- 
(0 above; and 

(j) a polynucleotide which encodes a species homologue of the protein 
of (g) or (h) above. 

2. The composition of claim 1 , further comprising a pharmaceutically acceptable 

carrier. 

3. A method for preventing, treating or ameliorating a medical condition which 
comprises administering to a mammalian subject a therapeutically effective amount of a 
composition of claim 2. 

4. A composition comprising a protein, wherein said protein comprises an amino 
acid sequence selected from the group consisting of: 
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(a) the amino acid sequence of SEQ ID NO:2; 

(b) fragments of the amino acid sequence of SEQ ID N0:2; and 

(c) the amino acid sequence encoded by the cDNA insert of clone 
AE402„Ii deposited under accession number ATCC 98190; 

the protein being substantially free from other mammalian proteins. 

5. The composition of claim 4, wherein said protein comprises the amino acid 
sequence of SEQ ID N0:2. 

6. The composition of claim 4. further comprising a pharmaceutically acceptable 

carrier. 

7. A method for preventing, treating or ameliorating a medical condition which 
comprises administering to a mammalian subject a therapeutically effective amount of a 
composition of claim 6. 

8. A composition comprising an isolated protein encoded by a polynucleotide 
selected from the group consisting of: 

(a) a polynucleotide comprising the nucleotide sequence of SEQ ID N0:4; 

(b) a polynucleotide comprising the nucleotide sequence of SEQ ID NO:4 
from nucleotide 61 to nucleotide 513; 

(c) a polynucleotide comprising the nucleotide sequence of SEQ ID NO:4 
from nucleotide 322 to nucleotide 513; 

(d) a polynucleotide comprising the nucleotide sequence of the full length 
protein coding sequence of clone AE6 1 0_1 i deposited under accession number ATCC 
98190; 

(e) a polynucleotide encoding the full length protein encoded by the 
cDNA insert of clone AE610_li deposited under accession number ATCC 98190; 

(0 a polynucleotide comprising the nucleotide sequence of the mature 
protein coding sequence of clone AE61 0_] i deposited under accession number ATCC 
98190; 

(g) a polynucleotide encoding the mature protein encoded by the cDNA 
insert of clone AE610_li deposited under accession number ATCC 98190; 

(h) a polynucleotide encoding a protein comprising the amino acid 
sequence of SEQ ID N0:5; 
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(i) a polynucleotide encoding a protein comprising a fragment of the 
amino acid sequence of SEQ ID N0:5 having biological activity; 

(j) a polynucleotide which is an allelic variant of a polynucleotide of (a)- 
(g) above; and 

(k) a polynucleotide which encodes a species homologue of the protein 
of (h) or(i) above. 



9. A composition comprising a protein, wherein said protein comprises an amino 
acid sequence selected from the group consisting of: 

(a) the amino acid sequence of SEQ ID N0:5; 

(b) fragments of the amino acid sequence of SEQ ID N0:5; and 

(c) the amino acid sequence encoded by the cDNA insert of clone 
AE610_li deposited under accession number ATCC 98190; 

the protein being substantially free from other mammalian proteins. 

10. A composition comprising an isolated protein encoded by a polynucleotide 
selected from the group consisting of: 

(a) a polynucleotide comprising the nucleotide sequence of SEQ ID 

NO:7; 

(b) a polynucleotide comprising the nucleotide sequence of SEQ ID NO:7 
from nucleotide 20 to nucleotide 523; 

(c) a polynucleotide comprising the nucleotide sequence of the full length 
protein coding sequence of clone AH106_li deposited under accession number ATCC 
98190; 

(d) a polynucleotide encoding the full length protein encoded by the 
cDNA insert of clone AH106_li deposited under accession number ATCC 98190; 

(e) a piolynucleotide comprising the nucleotide sequence of the mature 
protein coding sequence of clone AH 1 06_1 i deposited under accession number ATCC 
98190; 

(f) a polynucleotide encoding the mature protein encoded by the cDNA 
insert of clone AH106_li deposited under accession number ATCC 98190; 

(g) a polynucleotide encoding a protein comprising the amino acid 
sequence of SEQ ID NO: 8; 

(h) a polynucleotide encoding a protein comprising a fragment of the 
amino acid sequence of SEQ ID N0:8 having biological activity; 
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(i) a polynucleotide which is an allelic variant of a polynucleotide of (a)- 
(f) above; and 

(j) a polynucleotide which encodes a species homologuc of the protein 
of (g) or (h) above. 



11. A composition comprising a protein, wherein said protein comprises an amino 
acid sequence selected from the group consisting of: 

(a) the amino acid sequence of SEQ ID N0:8; 

(b) fragments of the amino acid sequence of SEQ ID N0:8; and 

, (c) the amino acid sequence encoded by the cDNA insert of clone 
AH106_li deposited under accession number ATCC 98190; 
the protein being substantially free from other mammalian proteins. 

12. A composition comprising an isolated protein encoded by a polynucleotide 
selected from the group consisting of: 

(a) a polynucleotide comprising the nucleotide sequence of SEQ ID 

N0:9; 

(b) a polynucleotide comprising the nucleotide sequence of SEQ ID N0:9 
from nucleotide 1 30 to nucleotide 309; 

(c) a polynucleotide comprising the nucleotide sequence of the full length 
protein coding sequence of clone AH 1 96. 1 i deposited under accession number ATCC 
98190; 

(d) a polynucleotide encoding the full length protein encoded by the 
cDNA insert of clone AH196_li deposited under accession number ATCC 98190; 

(e) a polynucleotide comprising the nucleotide sequence of the mature 
protein coding sequence of clone AH196_1 i deposited under accession number ATCC 
98190; 

(f) a polynucleotide encoding the mature protein encoded by the cDNA 
insert of clone AH196_li deposited under accession number ATCC 98190; 

(g) a polynucleotide encoding a protein comprising the amino acid 
sequence of SEQ ID NO: 10; 

(h) a polynucleotide encoding a protein comprising a fragment of the 
amino acid sequence of SEQ ID NO: 10 having biological activity; 

(i) a polynucleotide which is an allelic variant of a polynucleotide of (a)- 
(0 above; and 
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(j) a polynucleotide which encodes a species homologue of the protein 
of (g) or (h) above. 



13. A composition comprising a protein, wherein said protein comprises an anuno 
acid sequence selected from the group consisting of: 

(a) the anuno acid sequence of SEQ ID NO: 10; 

(b) fragments of the amino acid sequence of SEQ ID NO; 1 0; and 

(c) the amino acid sequence encoded by the cDNA insert of clone 
AH196„li deposited under accession number ATCC 98190; 

the protein being substantially free from other mammalian proteins. 

14. A composition comprising an isolated protein encoded by a polynucleotide 
selected from the group consisting of: 

(a) a polynucleotide comprising the nucleotide sequence of SEQ ID 

NO: 1 2; 

(b) a polynucleotide comprising the nucleotide sequence of SEQ ID 
NO: 12 from nucleotide 69 to nucleotide 467; 

(c) a polynucleotide comprising the nucleotide sequence of the full length 
protein coding sequence of clone AI6_li deposited under accession number ATCC 
98190; 

(d) a polynucleotide encoding the full length protein encoded by the 
cDNA insert of clone AI6_li deposited under accession number ATCC 98190; 

(e) a polynucleotide comprising the nucleotide sequence of the mature 
protein coding sequence of clone AI6_li deposited under accession number ATCC 
98190; 

(f) a polynucleotide encoding the mature protein encoded by the cDNA 
insert of clone AI6_1 i deposited under accession number ATCC 98 1 90; 

(g) a polynucleotide encoding a protein comprising the amino acid 
sequence of SEQ ID NO: 13; 

(h) a polynucleotide encoding a protein comprising a fragment of the 
amino acid sequence of SEQ ID NO: 1 3 having biological activity; 

(i) a polynucleotide which is an allelic variant of a polynucleotide of (a)- 
(0 above; and 

(j) a polynucleotide which encodes a species homologue of the protein 
of (g) or (h) above. 
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15. A composition comprising a protein, wherein said protein comprises an amino 
acid sequence selected from the group consisting of: 

(a) the amino acid sequence of SEQ ID NO: 13; 

(b) the amino acid sequence of SEQ ID NO: 1 3 from amino acid 69 to 
amino acid 133; 

(c) fragments of the amino acid sequence of SEQ ID NO: 1 3; and 

(d) the amino acid sequence encoded by the cDNA insert of clone AI6_1 i 
deposited under accession number ATCC 98190; 

the protein being substantially free from other mammalian proteins. 

16. A composition comprising an isolated protein encoded by a polynucleotide 
selected from the group consisting of: 

(a) a polynucleotide comprising the nucleotide sequence of SEQ ID 

NO:16; 

(b) a polynucleotide comprising the nucleotide sequence of SEQ ID 
NO: 1 6 from nucleotide 55 to nucleotide 337; 

(c) a polynucleotide comprising the nucleotide sequence of the full length 
protein coding sequence of clone AJ13_1 i deposited under accession number ATCC 
98190; 

(d) a polynucleotide encoding the full length protein encoded by the 
cDNA insert of clone AJ 1 3_ 1 i deposited under accession number ATCC 98 1 90; 

(e) a polynucleotide comprising the nucleotide sequence of the mature 
protein coding sequence of clone A J 1 3„1 i deposited under accession number ATCC 
98190; 

(0 a polynucleotide encoding the mature protein encoded by the cDNA 
insert of clone AJ13_1 i deposited under accession number ATCC 98190; 

(g) a polynucleotide encoding a protein comprising the amino acid 
sequence of SEQ ID NO: 1 7; 

(h) a polynucleotide encoding a protein comprising a fragment of the 
amino acid sequence of SEQ ID NO: 17 having biological activity; 

(i) a polynucleotide which is an allelic variant of a polynucleotide of (a)- 
(0 above; and 

(j) a polynucleotide which encodes a species homologue of the protein 
of (g) or (h) above. 
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17. A composition comprising a protein, wherein said protein comprises an amino 
acid sequence selected from the group consisting of: 

(a) the amino acid sequence of SEQ ID NO: 17; 

(b) the amino acid sequence of SEQ ID NO: 17 from amino acid 12 to 
amino acid 94; 

(c) fragments of the amino acid sequence of SEQ ID NO: 17; and 

(d) the amino acid sequence encoded by the cDNA insert of clone 
AJ 1 3_1 i deposited under accession number ATCC 981 90; 

the protein being substantially free from other mammalian proteins. 

18. A composition comprising an isolated protein encoded by a polynucleotide 
selected from the group consisting of: 

(a) a polynucleotide comprising the nucleotide sequence of SEQ ID 

NO: 19; 

(b) a polynucleotide comprising the nucleotide sequence of SEQ ID 
NO: 19 from nucleotide 33 to nucleotide 422; 

(c) a polynucleotide comprising the nucleotide sequence of SEQ ID 
NO: 19 from nucleotide 1 14 to nucleotide 422; 

(d) a polynucleotide comprising the nucleotide sequence of the full length 
protein coding sequence of clone AJ27.1i deposited under accession number ATCC 
98190; 

(e) a polynucleotide encoding the full length protein encoded by the 
cDNA insert of clone AJ27_li deposited under accession number ATCC 98190; 

(f) a polynucleotide comprising the nucleotide sequence of the mature 
protein coding sequence of clone AJ27_li deposited under accession number ATCC 
98190; 

(g) a polynucleotide encoding the mature protein encoded by the cDNA 
insert of clone AJ27_li deposited under accession number ATCC 98190; 

(h) a polynucleotide encoding a protein comprising the amino acid 
sequence of SEQ ID NO:20; 

(i) a polynucleotide encoding a protein comprising a fragment of the 
amino acid sequence of SEQ ID NO:20 having biological activity; 

(j) a polynucleotide which is an allelic variant of a polynucleotide of (a)- 
(g) above; and 
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(k) a polynucleotide which encodes a species homologue of the protein 
of (h) or(i) above. 



1 9. A composition comprising a protein, wherein said protein comprises an amino 
acid sequence selected from the group consisting of: 

(a) the amino acid sequence of SEQ ID NO:20; 

(b) fragments of the amino acid sequence of SEQ ID N0:20; and 

(c) the amino acid sequence encoded by the cDNA insert of clone 
AJ27_li deposited under accession number ATCC 98190; 

the protein being substantially free from other mammalian proteins. 

20. A composition comprising an isolated protein encoded by a polynucleotide 
selected from the group consisting of: 



NO:22; 



(a) a polynucleotide comprising the nucleotide sequence of SEQ ID 



(b) a polynucleotide comprising the nucleotide sequence of SEQ ID 
NO:22 from nucleotide 47 to nucleotide 517; 

(c) a polynucleotide comprising the nucleotide sequence of SEQ ID 
NO:22 from nucleotide 1 16 to nucleotide 517; 

(d) a polynucleotide comprising the nucleotide sequence of the full length 
protein coding sequence of clone AJ142_li deposited under accession number ATCC 
98190; 

(e) a polynucleotide encoding the full length protein encoded by the 
cDNA insert of clone AJI42_li deposited under accession number ATCC 98190; 

(0 a polynucleotide comprising the nucleotide sequence of the mature 
protein coding sequence of clone AJ142_li deposited under accession number ATCC 
98190; 

(g) a polynucleotide encoding the mature protein encoded by the cDNA 
insert of clone AJ142.1i deposited under accession number ATCC 98190; 

(h) a polynucleotide encoding a protein comprising the amino acid 
sequence of SEQ ID NO:23; 

(i) a polynucleotide encoding a protein comprising a fragment of the 
amino acid sequence of SEQ ID NO:23 having biological activity; 

0) a polynucleotide which is an allelic variant of a polynucleotide of (a)- 
(g) above; and 
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(k) a polynucleotide which encodes a species homologue of the protein 
of (h) or (i) above. 



21 . A composition comprising a protein, wherein said protein comprises an amino 
acid sequence selected from the group consisting of: 

(a) the amino acid sequence of SEQ ID NO:23; 

(b) fragments of the amino acid sequence of SEQ ID NO:23; and 

(c) the amino acid sequence encoded by the cDNA insert of clone 
AJ142_li deposited under accession number ATCC 98190; 

the protein being substantially free from other mammalian proteins. 

22. A composition comprising an isolated protein encoded by a polynucleotide 
selected from the group consisting of: 

(a) a polynucleotide comprising the nucleotide sequence of SEQ ID 

NO:24; 

(b) a polynucleotide comprising the nucleotide sequence of SEQ ID 
NO:24 from nucleotide 312 to nucleotide 417; 

(c) a polynucleotide comprising the nucleotide sequence of the full length 
protein coding sequence of clone AK604„1 i deposited under accession number ATCC 
98190; 

(d) a polynucleotide encoding the full length protein encoded by the 
cDNA insert of clone AK604.1i deposited under accession number ATCC 98190; 

(e) a polynucleotide comprising the nucleotide sequence of the mature 
protein coding sequence of clone AK604„1 i deposited under accession number ATCC 
98190; 

(0 a polynucleotide encoding the mature protein encoded by the cDNA 
insert of clone AK604_li deposited under accession number ATCC 98190; 

(g) a polynucleotide encoding a protein comprising the amino acid 
sequence of SEQ ID NO:25; 

(h) a polynucleotide encoding a protein comprising a fragment of the 
amino acid sequence of SEQ ID NO:25 having biological activity; 

(i) a polynucleotide which is an allelic variant of a polynucleotide of (a)- 
(0 above; and 

(j) a polynucleotide which encodes a species homologue of the protein 
of (g) or (h) above. 
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23. A composition comprising a protein, wherein said protein comprises an amino 
acid sequence selected from the group consisting of: 

(a) the amino acid sequence of SEQ ID NO:25; 

(b) fragments of the amino acid sequence of SEQ ID NO:25; and 

(c) the amino acid sequence encoded by the cDNA insert of clone 
AK604_]i deposited under accession number ATCC 981 90; 

the protein being substantially free from other mammalian proteins. 



24. A composition comprising an isolated protein encoded by a polynucleotide 
selected from the group consisting of: 

(a) a polynucleotide comprising the nucleotide sequence of SEQ ID 

NO:27; 

(b) a polynucleotide comprising the nucleotide sequence of SEQ ID 
NO:27 from nucleotide 76 to nucleotide 372; 

(c) a polynucleotide comprising the nucleotide sequence of the full length 
protein coding sequence of clone AK620_1 i deposited under accession number ATCC 
98190; 

(d) a polynucleotide encoding the full length protein encoded by the 
cDNA insert of clone AK620„li deposited under accession number ATCC 981 90; 

(e) a polynucleotide comprising the nucleotide sequence of the mature 
protein coding sequence of clone AK620_1 i deposited under accession number ATCC 
98190; 

(0 a polynucleotide encoding the mature protein encoded by the cDNA 
insert of clone AK620_li deposited under accession number ATCC 98190; 

(g) a polynucleotide encoding a protein comprising the amino acid 
sequence of SEQ ID NO:28; 

(h) a polynucleotide encoding a protein comprising a fragment of the 
amino acid sequence of SEQ ID NO:28 having biological activity; 

(i) a polynucleotide which is an allelic variant of a polynucleotide of (a)- 
(0 above; and 

0) a polynucleotide which encodes a species homologue of the protein 
of (g) or (h) above. 



25. A composition comprising a protein, wherein said protein comprises an amino 
acid sequence selected from the group consisting of: 
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(a) the amino acid sequence of SEQ ID NO:28; 

(b) fragments of the amino acid sequence of SEQ ID NO:28; and 

(c) the amino acid sequence encoded by the cDNA insert of clone 
AK620_li deposited under accession number ATCC 98190; 

the protein being substantially free from other mammalian proteins. 

26. A composition comprising an isolated protein encoded by a polynucleotide 
selected from the group consisting of: 

(a) a polynucleotide comprising the nucleotide sequence of SEQ ID 

NO:29; 

(b) a polynucleotide comprising the nucleotide sequence of SEQ ID 
NO:29 from nucleotide 367 to nucleotide 552; 

(c) a polynucleotide comprising the nucleotide sequence of the full length 
protein coding sequence of clone AK650_1 i deposited under accession number ATCC 
98190; 

(d) a poIynucleoUde encoding the full length protein encoded by the 
CDNA insert of clone AK650_li deposited under accession number ATCC 98190; 

(e) a polynucleotide comprising the nucleotide sequence of the mature 
protein coding sequence of clone AK650_li deposited under accession number ATCC 
98190; 

(0 a polynucleotide encoding the matui* protein encoded by the cDNA 
insert of clone AK650_li deposited under accession number ATCC 98190; 

(g) a polynucleotide encoding a protein comprising the amino acid 
sequence of SEQ ID NO:30; 

(h) a polynucleotide encoding a protein comprising a fragment of the 
amino acid sequence of SEQ ID NO:30 having biological activity; 

(i) a polynucleotide which is an allelic variant of a polynucleotide of (a)- 
(0 above; and 

(j). a polynucleotide which encodes a species homologue of the protein 
of(g)or(h) above. 



27. A composition comprising a protein, wherein said protein comprises an a 
acid sequence selected from the group consisting of: 

(a) the amino acid sequence of SEQ ID NO:30; 

(b) fragments of the amino acid sequence of SEQ ID NO:30; and 
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(c) the amino acid sequence encoded by the cDNa • 

28 A 

(a) a polynucleotide comprising the nuclenHH. 

NO:32; "'^ '"^ ""<='eotide sequence of SEQ ID 

(b) a polynucleotide comprising the nuclenfin^ 

ATCC 98190; ~ deposited under accession numijcr 

(e) a polynucleotide encoding thf fnii i 
CDNA insert of clone AM226 , • w ' '"^"'^'^ '^'^ 

AM226_n deposited under accession number ATCC 98,90- 

(0 a polynucleotide comprising the nucleotide sequence of the . 
protem coding sequence of clone AM226 li H • . ' 
ATCC 98190; ""'^^^ ^'^^^^'^n """^ber 

insert of L A^^^- ^"^^^ ^ cDNA 

cone AM226_,. deposited under accession number ATCC 98,90- 

. se.enceofSBQ~°" ^"^^ ' ^ 

a^ sequ:n::r::rr^^^^^^ 77 ^ - - 

^ '^"■^^ •'^V'ng biological activity 

(a) the amino acid sequence of SEQ ID NO-33- 
^b) fragments of the amino acid sequence of SEQ ID NO:33; and 
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(c) the amino acid sequence encoded by the cDNA insert of clone 
AM226_1 i deposited under accession number ATCC 98190; 
the protein being substantially free from other mammalian proteins, 

30. A composition comprising an isolated protein encoded by a polynucleotide 
selected from the group consisting of: 

(a) a polynucleotide comprising the nucleotide sequence of SEQ ID 

NO:35; 

(b) a polynucleotide comprising the nucleotide sequence of SEQ ID 
NO:35 from nucleotide 281 to nucleotide 41 8; 

(c) a polynucleotide comprising the nucleotide sequence of SEQ ID 
NO:35 from nucleotide 353 to nucleotide 418; 

(d) a polynucleotide comprising the nucleotide sequence of the full length 
protein coding sequence of clone AR4 17_1 i deposited under accession number ATCC 
98190; 

(e) a polynucleotide encoding the full length protein encoded by the 
cDNA insert of clone AR417_li deposited under accession number ATCC 98190; 

(0 a polynucleotide comprising the nucleotide sequence of the mature 
protein coding sequence of clone AR41 7_1 i deposited under accession number ATCC 
98190; 

(g) a polynucleotide encoding the mahjre protein encoded by the cDNA 
insert of clone AR41 7_1 i deposited under accession number ATCC 98 1 90; 

(h) a polynucleotide encoding a protein comprising the amino acid 
sequence of SEQ ID NO:36; 

(i) a polynucleotide encoding a protein comprising a fragment of the 
amino acid sequence of SEQ ID NO:36 having biological activity; 

0) a polynucleotide which is an allelic variant of a polynucleotide of (a)- 
(g) above; and 

(k) a polynucleotide which encodes a species homologue of the protein 
of (h) or (i) above. 



31. A composition comprising a protein, wherein said pix)tein comprises an 
acid sequence selected from the group consisting of: 

(a) the amino acid sequence of SEQ ID NO:36; 

(b) fragments of the amino acid sequence of SEQ ID NO:36; and 
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(c) the amino acid sequence encoded by the cDNA insert of clone 
AR417_li deposited under accession number ATCC 98190; 
the protein being substantially free from other mammalian proteins. ' 

32. A composition comprising an isolated protein encoded by a polynucleotide 
selected from the group consisting of: 

(a) a polynucleotide comprising the nucleotide sequence of SEQ ID 

(b) a polynucleotide comprising the nucleotide sequence of SEQ ID 
NO:38 from nucleotide 496 to nucleotide 583; 

(c) a polynucleotide comprising the nucleotide sequence of SEQ ID 
NO:38 from nucleotide 565 to nucleotide 583; 

(d) apolynucleotide comprising the nucleotide sequence of the full length 
protein coding sequence of done AU43_li deposited under accession number ATCC 
98190; 

(e) a polynucleotide encoding the full length protein encoded by the 
cDNA insert of clone AU43Ji deposited under accession number ATCC 98190; 

(0 a polynucleotide comprising the nucleotide sequence of the mature 
protem coding sequence of clone AU43.Ii deposited under accession number ATCC 
98190; 

(g) a polynucleoUde encoding the mature protein encoded by the cDNA 
.nsert of clone AU43_1 i deposited under accession number ATCC 98 1 90; 

(h) a polynucleoUde encoding a protein comprising the amino acid 
sequence of SEQ ID NO:39; 

(i) a polynucleotide encoding a protein comprising a fragment of the 
ammo aad sequence of SEQ ID NO:39 having biological activity; 

a) apolynucleotide which is an allelic variant of a polynucleotide of (a)- 

(g) above; and 

(k) a polynucleotide which encodes a species homologue of the protein 

of (h) or (i) above. 



33. A composition comprising a protein, wherein said protein comprises an amino 
acid sequence selected from the group consisting of: 

(a) the amino acid sequence of SEQ ID NO:39; 

(b) fragments of the amino acid sequence of SEQ ID NO:39; and 
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(c) the amino acid sequence encoded by the cDNA insert of clone 
AU43_li deposited under accession number ATCC 98190; 
the protein being substantially free from other mammalian proteins. 

34. A composition comprising an isolated protein encoded by a polynucleotide 
selected from the group consisting of: 

(a) a polynucleotide comprising the nucleotide sequence of SEQ ID 

N0:41; 

(b) a polynucleotide comprising the nucleotide sequence of SEQ ID 
NO:41 from nucleotide 55 to nucleotide 405; 

(c) a polynucleotide comprising the nucleotide sequence of SEQ ID 
N0:41 from nucleotide 148 to nucleotide 405; 

(d) a polynucleotide comprising the nucleotide sequence of the full length 
protein coding sequence of clone AW60„1 i deposited under accession number ATCC 
98190; 

(e) a polynucleotide encoding the full length protein encoded by the 
cDNA insert of clone AW60_li deposited under accession number ATCC 98190; 

(0 a polynucleotide comprising the nucleotide sequence of the mature 
protein coding sequence of clone AW60_1 i deposited under accession number ATCC 
98190; 

(g) a polynucleotide encoding the mature protein encoded by the cDNA 
insert of clone AW60„li deposited under accession number ATCC 98190; 

(h) a polynucleotide encoding a protein comprising the amino acid 
sequence of SEQ ID NO:42; 

(i) a polynucleotide encoding a protein comprising a fragment of the 
amino acid sequence of SEQ ID NO:42 having biological activity; 

(j) a polynucleotide which is an allelic variant of a polynucleotide of (a)- 
(g) above; and 

(k) a polynucleotide which encodes a species homologue of the protein 
of (h) or (i) above. 

35. A composition comprising a protein, wherein said protein comprises an amino 
acid sequence selected from the group consisting of: 

(a) the amino acid sequence of SEQ ID NO:42; 

(b) fragments of the amino acid sequence of SEQ ID NO:42; and 
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(c) the amino acid sequence encoded by the cDNA insert of clone 
A W60_ 1 i deposited under accession number ATCC 98 1 90; 
the protein being substantially free from other mammalian proteins. 

36. A composition comprising an isolated protein encoded by a polynucleotide 
selected from the group consisting of: 

(a) a polynucleotide comprising the nucleotide sequence of SEO ID 

NO:44; 

(b) a polynucleotide comprising the nucleotide sequence of SEQ ID 
NO:44 from nucleotide 337 to nucleotide 525; 

(c) a polynucleotide comprising the nucleotide sequence of SEQ ID 
NO:44 from nucleotide 406 to nucleotide 525; 

(d) apolynucleotidecomprisingthenucleotidesequenceofthefull length 
protein coding sequence of clone BA176_li deposited under accession number ATCC 
98190; 

(e) a polynucleotide encoding the full length protein encoded by the 
cDNA insert of clone BA176_li deposited under accession number ATCC 98190; 

(f) a polynucleotide comprising the nucleotide sequence of the mature 
protein coding sequence of clone BA 1 76_1 i deposited under accession number ATCC 
98190; 

(g) a polynucleotide encoding the mature protein encoded by the cDNA 
itisert of clone BA176_li deposited under accession number ATCC 98190; 

(h) a polynucleotide encoding a protein comprising the amino acid 
sequence of SEQ ID NO:45; 

(i) a polynucleotide encoding a protein comprising a fragment of the 
amino acid sequence of SEQ ID NO:45 having biological activity; 

(j) a poIynucIeoUde which is an allelic variant of a polynucleotide of (a)- 
(g) above; and 

(k) a polynucleotide which encodes a species homologuc of the protein 
of (h) or (i) above. 

37. A composition comprising a protein, wherein said protein comprises an amino 
acid sequence selected from the group consisting of: 

(a) the amino acid sequence of SEQ ID NO:45 ; 

(b) fragments of the amino acid sequence of SEQ ID NO:45; and 
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(c) the amino acid sequence encoded by the cDNA insert of clone 
BA176_li deposited under accession number ATCC 98190; 
the protein being substantially free from other mammalian proteins. 

38. A composition comprising an isolated protein encoded by a polynucleotide 
selected from the group consisting of: 

(a) a polynucleotide comprising the nucleotide sequence of SEQ ID 

NO:47; 

(b) a polynucleotide comprising the nucleotide sequence of SEQ ID 
NO:47 from nucleotide 536 to nucleotide 628; 

(c) a polynucleotide comprising the nucleotide sequence of the full length 
protein coding sequence of clone BD140_li deposited under accession number ATCC 
98190; 

(d) a polynucleotide encoding the full length protein encoded by the 
cDNA insert of clone BD140„li deposited under accession number ATCC 98190; 

(e) a polynucleotide comprising the nucleotide sequence of the mature 
protein coding sequence of clone BD140_1 i deposited under accession number ATCC 
98190; 

(0 a polynucleotide encoding the mature protein encoded by the cDNA 
insert of clone BD140_1 i deposited under accession number ATCC 98190; 

(g) a polynucleotide encoding a protein comprising the amino acid 
sequence of SEQ ID NO:48; 

(h) a polynucleotide encoding a protein comprising a fragment of the 
amino acid sequence of SEQ ID NO:48 having biological activity; 

(i) a polynucleotide which is an allelic variant of a polynucleotide of (a)- 
(0 above; and 

(j) a polynucleotide which encodes a species homologue of the protein 
of (g) or (h) above. 

39. A composition comprising a protein, wherein said protein comprises an amino 
acid sequence selected from the group consisting of: 

(a) the amino acid sequence of SEQ ID NO:48; 

(b) fragments of the amino acid sequence of SEQ ID NOr48; and 

(c) the amino acid sequence encoded by the cDNA insert of clone 
BDMO.li deposited under accession number ATCC 98190; 
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NO:50 from nucleotide 345 to nucJeoUde 



617; 



98 1 90; ~ deposited under accession number ATCC 

(e) a polynucleotide encoding the full u„„,k 

98,90; ^°^°'-"'"P«"«"'"*'«ccessio„„™i«rATCC 

insen o,L bX""?*"' """""^ ^°NA 
on. BD407.I, depo«,ed -nder acce«i™ „„„,b., aTCC 98190- 

c>ty ID N0.51 having biological activity 



(a) the amino acid sequence of SEQ ID N0:51 ; 

(b) acid sequence of SEQ ID NO-si from » 

amino acid 32; ^cid I to 

fragments of the amino acid sequence of SEQ ID N0:51; and 
113 



wo 98/14470 PCT/USy7/18032 

(d) the amino acid sequence enccded by the cDNA insert of clone 
BD407_li deposited under accession number ATCC 98190; 
the protein being substantially free from other mammalian proteins. 

42. A composition comprising an isolated protein encoded by a polynucleotide 
selected from the group consisting of: 

(a) a polynucleotide comprising the nucleotide sequence of SEQ ID 

NO:52; 

(b) a polynucleotide comprising the nucleotide sequence of SEQ ID 
NO:52 from nucleotide 178 to nucleotide 534; 

(c) a polynucleotide comprising the nucleotide sequence of the full length 
protein coding sequence of clone BK90_1 i deposited under accession number ATCC 
98190; 

(d) a polynucleotide encoding the full length protein encoded by the 
cDNA insert of clone BF290_li deposited under accession number ATCC 98190; 

(e) a polynucleotide comprising die nucleotide sequence of the mature 
protein coding sequence of clone BF290_1 i deposited under accession number ATCC 
98190; 

(0 a polynucleotide encoding the mature protein encoded by the cDNA 
insert of clone BF290_1 i deposited under accession number ATCC 98 1 90; 

(g) a polynucleotide encoding a protein comprising the amino acid 
sequence of SEQ ID NO:53; 

Oi) a polynucleotide encoding a protein comprising a fragment of the 
amino acid sequence of SEQ ID NO:53 having biological activity; 

(i) a polynucleotide which is an allelic variant of a polynucleotide of (a)- 
(0 above; and 

(j) a polynucleotide which encodes a species homologue of the protein 
of (g) or (h) above. 

43. A composition comprising a protein, wherein said protein comprises an amino 
acid sequence selected from the group consisting of: 

(a) the amino acid sequence of SEQ ID NO:53; 

(b) fragments of the amino acid sequence of SEQ ID NO:53; and 

(c) the amino acid sequence encoded by the cDNA insert of clone 
BF290_li deposited under accession number ATCC 98190; 

the protein being substantially free from other mammalian proteins. 
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FIGURE lA 



Hindlll 1 




BamHI 2772 Ctal 2666 



Plasmid name: pED6dpc2 
Plasmid size: 5374 bp 



Comments/References: pED6dpc2 is derived from pED6dpc1 by insertion of a n 
polylinker to facilitate cDNA cloning. SST cDNAs are cloned between EcoRI and Notl. 
pED vectors are described in Kaufman et al.(1991), NAR 19: 4485-4490. 
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FIGURE IB 




Ptasmid name: pNOTs 
PUsmid size: 4529 bp 



Commants/Refareneas: pNOTs is a derivative of pMT2 (Kaufman et aU989. Mol.CelLBiol.9:174M750). 
DHFR was delated and a new polylinker was Inserted between EcoRI and Hpal. M13 origin 
of replication was inserted in the Clal site. SST cDNAs are cloned between EcoRI and 
Not! 



2/10 



wo 98/14470 



PCT/US97/1S032 



CO 
o 
2: 

Q. 
CO 



CO 
LU 
< 

to 



10 



Q 
Q. 



Li. 
CO 

d CD 



<-:.^:.>. 



o 

o 
(d 



o o 

CO CO 
X I 



CO c»* 



O T- 

2 LO UJ 

CO ^ 10 □ 



co 

O 



i^^ :;. 



o o 

CO CO 



^ Q. d O or X I 



^WW' «w> *5(j**> oowto iit>iS!> 4Srt^ 




Fig. 2 
3/10 



wo 98/14470 

PCT/US97/18032 



O 
CD 



CVl <0 CO CM N 

O CO CD OD O 

T- to ^ OJ CO i: Q 

X ^ 5 2 S - 

< < < < < 



CM 

in 
O 



O 
CD 
CM 

o: 
< 



c^ 

CM 

tn 
CD 



CO 

o 

X 

< 



^ J8 2i t 

CO jVj CO o 

10 CM CM CO 

< < < < 



SQ 
111 

X a. 



3 
CO 

o 
111 





Fig. 3 
4/10 



wo 98/14470 



PCT/US97/18032 




116 117 118 119120121 122 




m 117 118119120 121122 

fit "^-^'"^ 



mm 










^ir^-' ; 



0 



• 



Fig. 4 
5/10 



wo 98/14470 



PCT/US97/18032 



S I I 

a. S 5 
< < < 



o <o 

o cn 

DC ^ 

< < 



in 



o 
in 

^ ^ 



CM r: 9 o <o o 
CO S 2 2 o> ^ 
T- 2 2: s: ^ eg ^ 



= 3 
G) .i= 
O > 





0 



Fig. 5 
6/10 



wo 98/14470 



PCTAJS97/18032 



CM 
LO CO 

g 5 
< < 



8 

< 



2 CO 



< 



CO 

< 



< < 



(A 

*> 

< o 



(O CO (M 

0 2 0 
< < < 



CM 











CO 


CO 


CO 




CM 


w 






< 


< 


< 










. .V.'.- •^^V.s-,- 




Fig. 6 
7/10 



wo 98/14470 



PCT/US97/18032 



CVJ 



CO 
CVJ 

o 

CD 



CM 



CNJ 



o 



CO 



JL CO 



CO CO 

^ o 

in S 

X < 



o 

CO 



CO 



CM 

9 15 



139 140 141 142 143 144 145 146 



CVJ 



CVJ 



CO 



Q 

UJ 



o 



^ CO 



o ^ 



^ V - 6 

V CVJ 9 S 

^ 2 2 

^ K :!j ^ 

CD ^ CD to 5 

CO < OQ X < 

139 140 141 142 143 144 145 146 



CM 



Q 
UJ 
0. 



^mS^ ^jift^ ^siSMiJ'- -^t^ii^ 





Fig. 7 
8/10 



wo 98/14470 



PCT/US97/18032 



Q. 

^ <OliJQe>UjTm ^<aUJQGLUuiuj5 

S com<mcD<=!Q. ^cQm<cacQ<Q.CLX 



158 159 160 161 162 163 164 158 159160 161 162 163 164 




..... .x^^-i' 



Fig. 8 
9/10 



6e> 



PiCT 



WORLD INTELLECTUAL PROPERTY OHGANI2AnON 
Iniemaiionul Boreaii 




INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) Inlemalionu! Patent Classification ^ : 
C12Q 1/68 



A2 



(11) International PublicaUon Number: WO 98/20165 

(43) Intcrnatiomil Publication Duto: 14 May 1998 ( 14.05.98) 



(21) International Application Number: PCT/US97/20313 

(22) IntemaUonal Filing Date: 5 November 1997 (05.1 1.97) 



(30) Priority Data: 
60/030,455 



6 November 1996 (06.11.96) 



US 



(71) Applicant (for all designated States except US): WHITEHEAD 

INSTITUTE FOR BIOMEDICAL RESEARCH [US/US]; 
Nine Cambridge Center, Cambridge, MA 02142 (US). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only)\ LANDER, Eric, S. 
[US/US]; 151 Bishop Allen Drive, Cambridge, MA 02138 
(US). WANG, David [CN/US]; Apanmcnt 314, 276 Mass- 
. .achusetts Avenue, Arlington, MA 02173 (US). HUDSON, 
Thomas [C A/US J; 361 Metcalfe Avenue. Wesimount, 
Quebec H3Z 2J2 (CA). 

(74) Agents: GRANAHAN, Patricia et al.; Hamilton, Brook, Smith 
& Reynolds, Two Militia Drive, Lexington, MA 02173 
(US). 



(81) Designated States: I?, US. European patent (AT, BE. CH, DE, 
DK, ES, FI, FR. GB, OR. IE. IT, LU. MC, NU PT. SE). 



Published 

Without international search report and to be republished 
upon receipt of that report. 



(54) Title: BIALLELIC MARKERS 
(57) Abstract 

Vnt invention provides nucleic acid segments of the human genome including polymorphic sites. Ailele-specific primers and prob^ 
hybridizing to regions flanking tlx^se sites are also provided. The nucleic acids, primers and probes are used m applications such as forensics. 
paternity testing, medicine and genetic analysis. 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States party to the PCX on the front pages of pamphlets publishing international applications under the PCT. 



AL 


Albania 


ES 


Spain 


LS 


Lesotho 


SI 


Slovenia 


AM 


Anncnia 


FI 


FinlajKl 


LT 


Lithumiia 


SK 


Slovakia 


AT 


Aiamn 


FR 


France 


LU 


Luxembourg 


SN 


Senegal 


AU 


Aus:ralia 


GA 


Gabon 


LV 


Latvia 


sz 


Swaziland 


A2 


Azerbaijan 


GB 


United Kingdocn 


MC 


Monaco 


TO 


Chad 


BA 


Bosnia and Herzegovina 


GE 


Georgia 


MD 


Republic of Moldova 


TG 


Togo 


DB 


BarbadCM 


GH 


Ghana 


MC 


Madagascar 


TJ 


Tajikistan 


BE 


Belgium 


GN 


Guinea 


MK 


The fonnet Yugoslav 


TM 


Turicmenistan 


BF 


Burkina Haso 


CR 


Cfcecc 




Republic of Macedonia 


TR 


Turkey 


BG 


Bulgaria 


flU 


Hungary 


ML 


Mali , 


rr 


Trinidad and Tobago 


BJ 


Benin 


IE 


Ireland 


MIS 


Mongol ta 


UA 


Ukraine 


BR 


Brazil 


IL 


Israel 


MR 


Mauritania 


UG 


Uganda 


BY 


Belarus 


IS 


Iceland 


MW 


Malawi 


US 


United Stales of America 


CA 


Canada 


IT 


luly 


MX 


Mexico 


UZ 


Uzbekistan 


CF 


Otniral AfHran Republic 


JP 


Japein 


NE 


Niger 


VN ■ 


Viet Nam 


CG 


Congo 


K£ 


Kenya 


NL 


Netherlands 


YU 


Yugoslavia 


CII 


Svt^il/^rland 


KG 


fCyrgy/JlMo 


NO 


Norway 


zw 


Zimbabwe 


C! 


COte d'lvcire 


KP 


Dcmocraiic People's 


NZ 


New Zealand 






CM 


Cameroon 




llcpublic of Korea 


PL 


Poland 






CN 


China 


KR 


Republic of Korea 


PT 


Ponugfll 






CU 


Cuba 


K^ 


Kazaksian 


RO 


Romania 






cz 


Czech Republic 


LC 


Saint Lucia 


RU 


Russian Federatioo 






DB 


Germany 


u 


Liechtenstein 


SD 


5tudao 






DK 


Denmark 


LK 


Sri LanV.a 


SE 


Sweden 






EE 


Estonia 


LK 


Liberia 


SG 


Singapore 







wo 98/20165 



PCT/US97/20313 



-1- 



B I ALLELIC MARKERS 



RELATED APPLICATIONS 

This application claims priority to U.S. provisional 
application Serial No. 60/030,455, filed November 6, 1996. 
5 the entire teachings of which are incorporated herein by 
reference . 

BACKGROUND OF THE INVENTION 

The genomes of all organisms undergo spontaneous, 
mutation in the course of their continuing evolution, 

10 generating variant forms of progenitor sequences (Gusella, 
Ann. J?ev. Biochem. 55, 831-854 (1986)). The variant form 
may confer an evolutionary advantage or disadvantage 
relative to a progenitor form or may be neutral. In some 
instances, a variant form confers a lethal disadvantage and 

15 is not transmitted to subsequent generations of the 

organism. In other instances, a variant form confers an 
evolutionary advantage to the species and is eventually 
incorporated into the DNA of many or most members of the 
species and effectively becomes the progenitor form. In 

20 many instances, both progenitor and variant form(s) survive 
and co-exist in a species population. The coexistence of 
multiple forms of a sequence gives rise to polymorphisms. 

Several different types of polymorphism have been 
reported. A restriction fragment length polymorphism 

25 (RFLP) Is a variation in DNA sequence that alters the 

length of a restriction fragment (Botstein et al . , Am. J. 
Hum, Genet, 32, 314-331 (1980)). The restriction fragment 
length polymorphism may create or delete a restriction 
site, thus changing the length of the restriction fragment. 
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RFLPs have been widely used in human and animal genetic 
analyses (see WO 90/13668; W090/11369; Donis - Keller , Cell 
SI, 319-337 (1987); I^ander et al . , Genetics 121, 85-99 
(1989)). When a heritable trait can be linked to a 
5 particular RFLP, the presence of the RFLP in an individual 
can be used to predict the likelihood that the animal will 
also exhibit the trait. 

Other polymorphisms take the form of short tandem 
repeats (STRs) that include tandem di-, tri- and tetra- 

10 nucleotide repeated motifs. These tandem repeats are also 
referred to as variable number tandem repeat (VNTR) 
polymorphisms. VNTRs have been used in identity "and 
paternity analysis (US 5,075,217; Armour ec al . , FEES Lett. 
307, 113-115 (1992); Horn et al . , WO 91/14003; Jeffreys, EP 

15 370,719), and in a large number of genetic mapping studies. 

Other polymorphisms take the form of single nucleotide 
variations between individuals of the same species. Such 
polymorphisms are far more frequent than RFLPs, STRs and 
VNTRs. Some single nucleotide polymorphisms occur in 

20 protein-coding sequences, in which case, one of the 

polymorphic forms may give rise to the expression of a 
defective or other variant protein and, potentially, a 
genetic disease. Examples of genes, in which polymorphisms 
within coding sequences give rise to genetic disease 

25 include /3-globin (sickle cell anemia) and CFTR (cystic 

fibrosis) . Other single nucleotide polymorphisms occur in 
noncoding regions. Some of these polymorphisms may also 
result in defective protein expression (e.g., as a result 
of defective splicing;) . Other single nucleotide 

30 polymorphisms have no phenotypic effects. 

Single nucleotide polymorphisms can be used in the same 
manner as RFLPs and VNTRs, but offer several advantages. 
Single nucleotide polymorphisms occur with greater 
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frequency and are spaced more uniformly throughout the 
genome than other forms of polymorphism. The greater 
frequency and uniformity of single nucleotide polymorphisms 
means that there is a greater probability that such a 
5 polymorphism will be found in close proximity to a genetic 
locus of interest than would be the case for other 
polymorphisms. The different forms of characterized single 
nucleotide polymorphisms are often easier to distinguish , 
than other types of polymorphism (e.g., by use of assays 

10 employing allele-specif ic hybridization probes or primers) . 
Only a small percentage of the total repository of 
polymorphisms in humans and other organisms -has been 
identified. The limited number of polymorphisms identified 
to date is due to the large amount of work required for 

15 their detection by conventional methods. For example, a 

conventional approach to identifying polymorphisms might be 
to sequence the same stretch of DNA in a population of 
individuals by dideoxy sequencing. In this type of 
approach, the amount of work increases in proportion to 

20 both the length of sequence and the number of individuals 

in a population and becomes impractical for large stretches 
of DNA or large numbers of persons . 
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SUMMARY OF THE INVENTION 

The invention provides nucleic acid sequences 
comprising nucleic acid segments of from about 10 to about 
200 bases as shown in the Table, column 7, including a 
5 polymorphic site. Complem.ents of these segments are also 
included. The segments can be" DNA or RNA, and can be 
double- or single- stranded . Scgmento can be, for example, 
10-20, 10-50 or 10-100 bases long. Preferred segments 
include a biallelic polymorphic site. The base occupying 

10 the polymorphic site in the segments can be the reference . 
(Table, column 3) or an alternative base .(Table, column 4) . 

The invention further provides allele -specif ir- 
oligonucleotides that hybridize to a segment of a fragment 
shown in the Table, column 7, or its complement. These 

15 oligonucleotides can be probes or primers. Also provided 
are isolated nucleic acids com.prising a sequence shown in 
the Table, column 7, or the complement thereto, in which 
the polymorphic site within the sequence is occupied by a 
base other than the reference base shown in the Table, 

2 0 column 3 . 

The invention further provides a method of analyzing a 
•nucleic acid from an individual. The method determines 
which base is present at any one of the polymorphic sites 
shown in the Table. Optionally, a set of bases occupying a 

25 set of the polymorphic sites shown in the Table is 

determined. This type of analysis can be performed on a 
number of individuals, who are tested for the presence of a 
disease phenotype . The presence or absence of disease 
phenotype is then correlated with a base or set of bases 

30 present at the polymorphic sites in the individuals tested. 
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DETAILED DESCRIPTION OF THE INVENTION 
DEFINITIONS 

An oligonucleotide can be DNA or RNA, and single- or 
double-stranded. Oligonucleotides can be naturally 
5 occurring or synthetic, but are typically prepared by 
synthetic means. The oligonucleotides of the present 
invention can comprise all of an oligonucleotide sequence 
presented in column 7 of the Table or a segment of such an 
oligonucleotide which includes a polymorphic site. 

10 Oligonucleotides can be all of a nucleic acid segment as 
represented in column 7 of the Table; a nucleic acid 
sequence which comprises a nucleic acid segment: represented 
in column 7 of the Table and additional nucleic acids 
(present at either or both ends of a nucleic acid segment 

15 of column 7) ; or a portion (fragment*) of a nucleic acid 

segment represented in column 7 of the Table which includes 
a polymorphic site. Prefe-rred oligonucleotides of the 
invention include segments of DNA, or their complements, 
which include any one of the polymorphic sites shown in the 

20 Table. The segments can be between 5 and 250 bases, and, 

in specific embodiments, are between 5-10, 5-20, 10-20, 10- 
50, 20-50 or 10-100 bases. The polymorphic site can occur 
within any position of the segment. The segments can be 
from any of the allelic forms of DNA shown in the Table. 

25 Hybridization probes are oligonucleotides which bind in 

a base-specific manner to a complementary strand of nucleic 
acid. Such probes include peptide nucleic acids, as 
described in Nielsen et al . , Science 254, 1497-1500 (1991). 
As used herein, the term primer refers to a single- 

30 stranded oligonucleotide which acts as a point of 

initiation of template-directed DNA synthesis under 
appropriate conditions (e.g., in the presence of four 
different nucleoside triphosphates and an agent for 
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polymerization, such as, DNA or RNA polymerase or reverse 
transcriptase) in an appropriate buffer and at a suitable 
temperature. The appropriate length of a primer depends on 
the intended use of the primer, but typically ranges from 
5 15 to 30 nucleotides. Short primer molecules generally 
require cooler temperatures to form sufficiently stable 
hybrid complexes with the template. A primer need not 
reflect the exact sequence of the template, but must be 
sufficiently complementary to hybridize with a template. 

10 The term primer site refers to the area of the target DNA 
to which a primer hybridizes. The term primer pair refers 
to a set of primers including a 5' (upstream) -primer that 
hybridizes with the 5' end of the DNA sequence to be 
amplified and a 3' (downstream) primer that hybridizes with 

15 the complement of the 3' end of the sequence to be 
amplified. 

AS used herein, linkage describes the tendency of 
genes, alleles, loci or genetic markers to be inherited 
together as a result of their location on the same 

20 chromosome. It can be measured by percent recombination 
•between the two genes, alleles, loci or genetic markers. 

As used herein, polymorphism refers to the occurrence 
of two or more genetically determined alternative sequences 
or alleles in a population. A polymorphic marker or site 

25 is the locus at which divergence occurs. Preferred markers 
have at least two alleles, each occurring at frequency of 
greater than 1%, and more preferably greater than 10% or 
20% of a selected population. A polymorphic locus may be 
as small as one base pair. Polymorphic markers include 

3 0 restriction fragment length polymorphisms, variable number 
of tandem repeats (VNTR's), hypervariable regions, 
mini satellites , dinucleotide repeats , trinucleotide 
repeats, tetranucleotide repeats, simple sequence repeats. 
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and insertion elements such as Alu. ' The first identified 
allelic form is arbitrarily designated as the reference 
form and other allelic forms are designated as alternative 
or variant alleles. The allelic form occurring most 
5 frequently in a selected population is sometimes referred 
to as the wildtype form. Diploid organisms may be 
homozygous or heterozygous for allelic forms. A diallelic 
or biallelic polymorphism has two forms. A triallelic 
polymorphism has three forms. 

10 A single nucleotide polymorphism occurs at a 

polymorphic site occupied by a single nucleotide, which is 
the site of variation between allelic sequences. -The site 
is usually preceded by and followed by highly conserved 
sequences of the allele (e.g., sequences that vary in less 

15 than l/lOO or l/lOOO members of the populations) . 

A single nucleotide polymorphism usually arises due to 
substitution of one nucleotide for another at the- 
polymorphic site. A transition is the replacement of one 
purine by another purine or one pyrimidine by another 

20 pyrimidine. A transversion is the replacement of a purine 
by a pyrimidine or vice versa. Single nucleotide 
polymorphisms can also arise from a deletion of a 
nucleotide or an insertion of a nucleotide relative to a 
reference allele. Typically the polymorphic site is 

25 occupied by a base other than the reference base. For 

example; where the reference allele contains the base "T" 
. at the polymorphic site, the altered allele can contain a 
"C", *'G" or "A" at the polymorphic site. 

Hybridizations are usually performed under stringent 

30 conditions, for example, at a salt concentration of no more 
than 1 M and a temperature of at least 25°C. For example, 
conditions of 5X SSPE (750 mM NaCl , 50 mM NaPhosphate, 5 mM 
EDTA, pH 7.4). and a temperature of 25-30<='C, or equivalent 
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conditions, are suitable for allele-specif ic probe 
hybridizations. Equivalent conditions can be determined by 
varying one or more of the parameters given as an example, 
as known in the art, while maintaining a similar degree of 
5 identity or similarity between the target nucleotide 
sequence and the primer or probe used. 

The term "isolated" is used herein to indicate that the 
material in question exists in a physical milieu distinct 
from that in which it occurs in nature. For example, an 

10 isolated nucleic acid of the invention may be substantially 
isolated with respect to the complex cellular milieu in 
which it naturally occurs. In some instances r' the* -isolated 
material will form pare of a composition (for example, a 
crude extract containing other substances) , buffer system 

15- or reagent mix. In other circumstance, the material may be 
purified to essential homogeneity, for example as 
determined by PAGE or column chromatography such as HPLC. 
Preferably, an isolated nucleic acid comprises at least 
about. 50, 80 or 90 percent (on a molar basis) of all 

20 macromolecular species present. 

I. Novel Polymorphisms of the Invention 

The novel polymorphisms of the invention are listed in 
the Table. The first column of the Table lists the names 
assigned to the fragments in which the polymorphisms occur. 

25 The fragments are all human genomic fragments. The 

sequence of one allelic form of each of the fragments 
(arbiti-arily referred to as the prototypical or reference 
form) has been previously published. These sequences are 
listed at http://www-genome.wi.mit.edu/ (all STS's 

30 (sequence tag sites)); http://shgc.stanford.edu (Stanford 
STS's); and http://ww.tigr.org/ (TIGR STS's). The Web 
sites also list primers for amplification of the fragments. 
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and the genomic location of fragments. Some fragments are 
expressed sequence tags, and some are random genomic 
f ragment-.R . All information in the websites concerning the 
fragm.ents listed in the Table is incorporated by reference 
5 in its entirety for all purposes. 

The second column lists the position in the fragment in 
which a polymorphic site has been found. Positions are 
numbered consecutively with the first base of the fragment 
sequence as listed in one of the above databases being 

10 assigned the number one. The third column lists the base 
occupying the polymorphic site in the sequence in the data 
base. This base is arbitrarily designated - the- re-ference or 
prototypical form, but it is not necessarily the most 
frequently occurring form. The fourth column in the Table 

15 lists the alternative base{s) at the polymorphic site. The 
fifth column of the Table lists a 5' (upstream or forward) 
primer that hybridizes with the 5' end of the DNA sequence 
to be amplified. The sixth column of the Table lists a 3' 
(downstream or reverse), primer that hybridizes with the 

20 complement of the 3' end of the sequence to be amplified. 
The seventh column of the Table lists a number of bases of 
sequence on either side of the polymorphic site in each 
fragment. The indicated sequences can be either DNA or 
RNA. In the latter, the T's shown in the Table are 

25 replaced by U's. The base occupying the polymorphic site 
is indicated in EUPAC-IUB ambiguity code. 

II. Analysis of Polymorphisms 
A. Preparation of Samples 

Polymorphisms are detected in a target nucleic acid 
30 from an individual being analyzed. For assay of genomic 
DNA, virtually any biological sample (other than pure red 
blood cells) is suitable. For example, convenient tissue 
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sampies include whole blood, semen; saliva, tears, urine, 
fecal material, sweat, buccal, skin and hair. For assay of 
cDNA or mRNA, che tissue sample must be obtained from an 
organ in which the target nucleic acid is expressed. For 
5 example, if the target nucleic acid is a cytochrome P450, 
the liver is a suitable source. 

Many of the methods described below require 
amplification of DNA from target samples. This can be 
accomplished by e.g., PGR. See generally PCR Technology: 
10 Principles and Applications for DNA Amplification (ed. H.A. 
Erlich, Freeman Press, NY, NY, 1992); PCR Protocols: A 
Guide to Methods and Applications (eds. Innis e t-al . , 
Academic Press, San Diego, CA, 1990); Mattila et al . , 
Nucleic Acids Res, 19, 4967 (1991); Eckert et al . , PCR 
15 Methods and Applications 1, 17 (1991); PCR (eds. McPherson 
et al., IRL Press, Oxford); and U.S. Patent 4,683,202. 

Other suitable amplification methods include the ligase 
chain reaction (LCR) (see Wu and Wallace, Genomics 4, 560 
(1989), Landegren et al . , Science 241, 1077 (1988), 
20 transcription amplification (Kwoh et al . , Proc. Natl. Acad. 
Sci. USA 86, 1173 (1989)), and self - sustained sequence 
replication (Guatelli et al . , Proc. Nat. Acad, Sci, USA, 
87, 1874 (1990)) and nucleic acid based sequence 
amplification (NASBA) . The latter two amplification 
25 methods involve isothermal reactions based on isothermal 
transcription, which produce both single stranded RNA 
(ssRNA) and double stranded DNA (dsDNA) as the ^ 
amplification products in a ratio of about 30 or 100 to 1, 
respectively. 

30 B. Detection of Polymorphisms in Target DNA 

There are two distinct types of analysis of target DNA 
for detecting polymorphisms. The first type of analysis. 
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sometimes referred to as de novo characterizacion , is 
carried out to identify polymorphic sites not previously 
characterized (i.e., to identify new polymorphisms). This 
analysis compares target sequences in different individuals 
to identify points of variation, i.e., polymorphic sites. 
By analyzing groups of individuals representing the 
greatest ethnic diversity among humans and greatest breed 
and species variety in plants and animals, patterns 
characteristic of the most common alleles/haplotypes of the 
locus can be identified, and the frequencies of such 
alleles/haplotypes in the population can be determined. 
Additional allelic f-requencies can be determined for 
subpopulations characterized by criteria such as geography, 
race, or gender. The de novo identification of 
15 polymorphisms of the invention is described in the Examples 
section. The second type of analysis determines which 
form(s) of a characterized (known) polymorphism are present 
in individuals under test. There are a variety of suitable 
procedures, which are discussed in turn. 



10 



20 1. Allele-Specif ic Probes 

The design and use of allele-specif ic probes for 
analyzing polymorphisms is described by e.g., Saiki et al . , 
Nature 324, 163-166 (1986); Dattagupta, EP 235,726, Saiki, 
WO 89/11548. Allele-specif ic probes can be designed that 

25 hybridize to a segment of target DNA from one individual 
but do not hybridize to the corresponding segment from 
another individual due to the presence of different 
polymorphic forms in the respective segments from the two 
individuals. Hybridization conditions should be 

30 sufficiently stringent that there is a significant 

difference in hybridization intensity between alleles, and 
oreferably an essentially binary response, whereby a probe 
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hybridizes to only one of the alleles. Some probes are 
designed to hybridize to a segment of target DNA such that 
the polymorphic site aligns with a central position (e.g., 
in a 15-mer at the 7 position; in a 16-mer, at either the 8 
5 or 9 position) of the probe. This design of probe achieves 
good discrimination in hybridization between different 
allelic forms . 

Allele-spccif ic probes are often used in pairs, one 
member of a pair showing a perfect match to a reference 
10 form of a target sequence and the other member showing a 
perfect match to a variant form. Several pairs of probes 
can then be immobilized on the same support -for 
simultaneous analysis of multiple polymorphisms within the 
same target sequence. 

15 2, Tiling Arrays 

The polymorphisms can also be identified by 
hybridization to nucleic acid arrays, some examples of 
which are described in WO 95/11995. One form of such 
arrays is described in the Examples section in connection 

20 with de novo identification of polymorphisms. The same 
array or a different array can be used for analysis of 
characterized polymorphisms. WO 95/11995 also describes 
subarrays that are optimized for detection of a variant 
form of a precharacterized polymorphism. Such a subarray 

25 contains probes designed to be complementary to a second 
reference sequence, which is an allelic variant of the 
first reference sequence. The second group of probes is 
designed by the same principles as described in the 
Examples, except that the probes exhibit complementarity to 

30 the second reference sequence. The inclusion of a second 
group (or further groups) can be particularly useful for 
analyzing short subsequences of the primary reference 
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sequence in which multiple mutations are expected to occur 
within a short distance commensurate with the length of the 
probes (e.g., two or more mutations within 9 to 21 bases). 

3. Allele-Specif ic Primers 
5 An allele-specif ic primer hybridizes to a site on 

target DNA overlapping a polymorphism and only primes 
amplification of an allelic form to which the primer 
exhibits perfect complementarity. See Gibbs , Nucleic Acid 
Res, 17, 2427-2448 (1989) . This primer is used in 

10 conjunction with a second primer which hybridizes at a 

distal site. Amplification proceeds from the -two-primers, 
resulting in a detectable product which indicates the 
particular allelic form is present. A control is usually 
performed with a second pair of primers, one of which shows 

15 a single base mismatch at the polymorphic site and the 

other of which exhibits perfect complementarity to a distal 
site. The single-base mismatch prevents amplification and 
no detectable product is formed. The method works best 
when the mismatch is included in the 3' -most position of 

20 the oligonucleotide aligned with the polymorphism because 
this position is most destabilizing to elongation from the 
primer (see, e.g., WO 93/22456). 

4 . Direcc-Sequencing 

The direct analysis of the sequence of polymorphisms of 
25 the present invention can be accomplished using either . the 
dideoxy chain termination method or the Maxam Gilbert 
method (see Sambrook et al . , Molecular Cloning, A 
Laboratory Manual (2nd Ed., CSHP, New York 1989); Zyskind 
et al . , J?eco/77binant DNA Laboratory Manual , (Acad. Press, 
30 1988) ) . 
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5. Denaturing Gradient Gel Electrophoresis 
Amplification products generated using the polymerase 
chain reaction can be analyzed by the use of denaturing 
gradient gel electrophoresis. Different alleles can be 
5 identified based on the different sequence -dependent 

melting properties and electrophoret ic migration of DNA in 
solution. Erlich, ed . , PCR Technology, Principles and 
Applications for DNA Amplification, (W,H. Freeman and Co, 
New York, 1992), Chapter 7. 

10 6. Single-Strand Conformation Polymorphism Analysis 

Alleles of target sequences can be differentiated using 
single-strand conformation polymorphism analysis, which 
identifies base differences by alteration in 
electrophoret ic migration of single stranded PCR products, 

15 as described in Orita et al . , Proc. Nat. Acad. Sci. 86, 

2766-2770 (1989) . Amplified PCR products can be generated 
as described above, and heated or otherwise denatured, to 
form single stranded amplification products. Single- 
stranded nucleic acids may refold or form secondary 

20 structures which are partially dependent on the base 

sequence. The different electrophoret ic mobilities of 
single-stranded amplification products can be related to 
base-sequence differences between alleles of target 
sequences. 

25 III. Methods of Use 

After determining polymorphic form(s) present in an 
individual at one or more polymorphic sites, this 
information can be used in a number of methods. 
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A. Forensics 

Determination of which polymorphic forms occupy a set 
of polymorphic sites in an individual identifies a set of 
polymorphic forms that distinguishes the individual. See 
generally National Research Council, The Evaluation of 
Forensic DMA Evidence (Eds. Pollard et al., National 
Academy Press, DC, 1996) . The more sites that are 
analyzed, the lower the probability that the set of 
polymorphic forms in one individual is the same as that in 
an unrelated individual. Preferably, if multiple sites are 
analyzed, the sites are unlinked. Thus, polymorphisms of 
the invention are often used in conjunction wi-th ^* 
polymorphisms in distal genes. Preferred polymorphisms for 
use in forensics are biallelic because the population 
15 frequencies of two polymorphic forms can usually be 

determined with greater accuracy than those of multiple 
polymorphic forms at multi-allelic loci. 

The capacity to identify a distinguishing or unique set 
of forensic markers in an individual is useful for forensic 
20 analysis. For example, one can determine whether a blood 
sample from a suspect matches a blood or other tissue 
sample from a crime scene by determining whether the set of 
polymorphic forms occupying selected polymorphic sites is 
the same in the suspect and the sample. If the set of 
25 polymorphic markers does not match between a suspect and a 
sample, it can be concluded (barring experimental error) 
that the suspect was not the source of the sample. ' If the 
set of markers does match, one can conclude that the DNA 
from the suspect is consistent with that found at the crime 
30 scene. If frequencies of the polymorphic forms at the loci 
tested have been determined (e.g., by analysis of a 
suitable population of individuals) , one can perform a 
statistical analysis to determine the probability that a 
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match of suspect and crime scene sample would occur by 
chance . 

p(ID) is the probability that two random individuals 

have the same polymorphic or allelic form at a given 

5 polymorphic site. In biallelic loci, four genotypes are 

possible: AA, AB, BA, and BB . If alleles A and B occur in 

a haploid genome of the organism with frequencies x and y, 

the probability of each genotype in a diploid organism is 

(see WO 95/12607) : 

10 Homozygote: p(AA)= 

Homozygote: p(BB)= y^ = (l-x)^ 

Single Heterozygote : p(AB)= p(BA)= xy = x(l-x) 
Both Heterozygoces : p(AB+BA)= 2xy = 2x(l-xJ-- 

The probability of identity at one locus (i.e, the 
15 probability that two individuals, picked at random from a 

population will have identical polymorphic forms at a given 
locus) is given by the equation: 
p(ID) = (x^)2 + (2xy)2 + (y=)^ 

These calculations can be extended for any number of 
20 polymorphic forms at a given locus. For example, the 

probability of identity p(ID) for a 3-allele system where 
the alleles have the frequencies in the population of x, y 
and z, respectively, is equal to the sum of the squares of 
the genotype frequencies: 
25 p{ID) = X* + (2xy)= + (2yz)2 + (2xz}=' + z* + y* 

In a locus of n alleles, the appropriate binomial 
expansion is used to calculate p(ID) and p(exc). 

The cumulative probability of identity (cum p(ID)) for 
each of multiple unlinked loci is determined by multiplying 
30 the probabilities provided by each locus. 

cum p (ID) = p(IDl)p(ID2)p(ID3) . . . . p(IDn) 
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The cumulative probability of non-identity for n loci 
(i.e. the probability that two random individuals will be 
different at 1 or more loci) is given by the equation: 

cum plncnlD) = 1-cum p (ID) . 
5 If several polymorphic loci are tested, the cumulative 

probability of non-identity for random individuals becomes 
very high (e.g., one billion to one). Such probabilities 
can be taken into account together with other evidence in 
determining the guilt or innocence of the suspect. 

10 B. Paternity Testing 

The object of paternity testing is usually* to"" determine 
whether a male is the father of a child. In most cases, 
the mother of the child is known and thus,. the mother's 
contribution to the child's genotype can be traced. 

15 Paternity testing investigates whether the part of the^ 
child's genotype not attributable to the mother is 
consistent with that of the putative father. Paternity 
testing can be performed by analyzing sets of polymorphisms 
in the putative father and the child. 

20 If the set of polymorphisms in the child attributable 

to the father does not match the set of polymorphisms of 
the putative father, it can be concluded, barring 
experimental error, that the putative father is not the 
real father. If the set of polymorphisms in the child 

25 attributable to the father does match the set of 

polymorphisms of the putative father, a statistical 
calculation can be performed to determine the probability 
of coincidental match. 

The probability of parentage exclusion (representing 
30 the probability that a random male will have a polymorphic 
form at a given polymorphic site that makes him 
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incompatible as the father) is given by the equation (see 
WO 95/12607) : 

p(exc) = xy(l-xy) 
where x and y are the population frequencies of alleles A 
5 and B of a biallelic polymorphic site. 

(At a triallelic site p{exc) = xy(l-xy) + yz(l- yz) + 
X2(l-x2)-f Sxyz (l-xyz) ) ) , where x, y and z and the 
respective population frequencies of alleles A, B and C) . 
The probability of non-exclusion is 
10 p(non-exc) = l-p(exc) 

The cumulative probability of non- exclusion 
(representing the value obtained when n loci, are Hsed) is 
thus : 

cum p(non-exc) = p (non-excl) p (non-exc2 ) p (non-exc3 ) . . . . 
15 p(non-excn) 

The cumulative probability of exclusion for n loci 
(representing the probability that a random male will be 
excluded) 

cum p(exc) = 1 - cum p(non-exc), 
20 If several polymorphic loci are included in the 

analysis, the cumulative probability of exclusion of a 
random male is very high. This probability can be taken 
into account in assessing the liability of a putative 
father whose polymorphic marker set matches the child's 
25 polymorphic marker set attributable to his/her father. 

C. Correlation of Polymorphisms with Phenotypic Traits 
The polymorphisms of the invention may contribute to 
the phenotype of an organism in different ways. Some 
polymorphisms occur within a protein coding sequence and 
30 contribute to phenotype by affecting protein structure. 
The effect may be neutral, beneficial or detrimental, or 
both beneficial and detrimental, depending on the 
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circumstances . For example, a heterozygous sickle cell 
mutation confers resistance to malaria, but a homozygous 
sickle cell mutation is usually lethal. Other 
polymorphisms occur in noncoding regions but may exert 
5 phenotypic effects indirectly via influence on replication, 
transcription, and translation. A single polymorphism may 
affect more than one phenotypic trait. Likewise, a single 
phenotypic trait may be affected by pol^^^orphisms in 
different genes. Further', some polymorphisms predispose an 

10 individual to a distinct mutation that is causally related 
to a certain phenotype . 

Phenotypic traits include diseases that -ha-ve 4tnown but 
hitherto unmapped genetic components (e.g., 
agammaglobulimenia, diabetes insipidus, Lesch-Nyhan 

15 syndrome, muscular dystrophy, Wiskott-Aldrich syndrome, 

Fabry's disease, familial hypercholesterolemia, polycystic 
kidney disease, hereditary spherocytosis, von Willebrand' s 
disease, tuberous sclerosis, hereditary hemorrhagic 
telangiectasia, familial colonic polyposis, Ehlers-Danlos 

20 syndrome, osteogenesis imperfecta, and acute intermittent 

porphyria) . Phenotypic traits also include symptoms of, or 
susceptibility to, multifactorial diseases of which a 
component is or may be genetic, such as autoimmune 
diseases, inflammation, cancer, diseases of the nervous 

25 system, and infection by pathogenic microorganisms. Some 
examples of autoimmune diseases include rheumatoid 
arthritis, multiple sclerosis, diabetes (insulin-dependent 
and non-independent), systemic lupus erythematosus and 
Graves disease. Some examples of cancers include cancers 

30 of the bladder, brain, breast, colon, esophagus, kidney, 
leukemia, liver, lung, oral cavity, ovary, pancreas, 
prostate, skin, stomach and uterus. Phenotypic traits also 
include characteristics such as longevity, appearance 
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(e.g., baldness, obesity), strength, speed, endurance, 
fertility, and susceptibility or receptivity to particular 
drugs or therapeutic treatments. 

Correlation is performed for a population of 
5 individuals who have been tested for the presence or 
absence of a phenotypic trait of interest and for 
polymorphic markers sets. To perform such analysis, the 
presence or absence of a set of polymorphisms (i.e. a 
polymorphic set) is determined for a set of the 

10 individuals, some of whom exhibit a particular trait, and 
some of which exhibit lack of the trait. The alleles of 
each polymorphism of the set are then reviewed-- to-determine 
whether the px-esence or absence of a particular allele is 
associated with the trait of interest. Correlation" can be 

15 performed by standard statistical methods such as a k- 
squared test and statistically significant correlations 
between polymorphic torm(s) and phenotypic characteristics 
are noted. For example, it might be found that the 
presence of allele Al at polymorphism A correlates with 

20 heart disease. As a further example, it might be found 

that the combined presence of allele Al at polymorphism A 
and allele Bl at polymorphism B correlates with increased 
milk production of a farm animal. 

Such correlations can be exploited in several ways. In 

25 the case of a strong correlation between a set of one or 

more polymorphic forms and a disease for which treatment is 
available, detection of the polymorphic form set in a human 
or animal patient may justify immediate administration of 
treatment, or at least the institution of regular 

30 monitoring of the patient. Detection of a polymorphic form 
correlated with serious disease in a couple contemplating a 
family may also be valuable to the couple in their 
reproductive decisions. For example, the female partner 
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might elect to undergo in vitro fertilization to avoid the 
possibility of transmitting such a polymorphism from her 
husband to her offspring. In the case of a weaker, but 
still statistically significant correlation between a 
5 polymorphic set and human disease, immediate therapeutic 
intervention or monitoring may not be justified. 
Nevertheless, the patient can be motivated to begin simple 
life-style changes (e.g., diet, exercise) that can be 
accomplished at little cost to the patient but confer 

10 potential benefits in reducing the risk of conditions to 
which the patient may have increased susceptibility by 
virtue of variant alleles. Identification -of -a polymorphic 
set in a patient correlated with enhanced recept iveness to 
one of several treatment regimes for a disease indicates 

15 that this treatment regime should be followed. 

For anim.als and plants, correlations between 
characteristics and phenotype are useful for breeding for 
desired characteristics. For example, Beitz et al . , US 
5,292,639 discuss use of bovine mitochondrial polymorphisms 

20 in a breeding program to improve milk production in cows. 
To evaluate the effect of mtDNA D-loop sequence 
polymorphism on milk production, each cow was assigned a 
value of 1 if variant or 0 if wildtype with respect to a 
prototypical mitochondrial DNA sequence at each of 17 

25 locations considered. Each production trait was analyzed 
individually with the following animal model: 

Yij.pn- M + -H Pj + + ^1 + ... + PE„ + a, +ep 

where Y^^^^p is the milk, fat, fat percentage, SNF, SNF 
percentage, energy concentration, or lactation energy 

30 record; is an overall mean; YSi is the effect common to 

all cows calving in year- season; is the effect common to 
cows in either the high or average selection line; /?i to 
are the binomial regressions of production record on mtDNA 
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D-loop sequence polymorphisms; PE^ is permanent 
environmental effect common to all records of cow n; is 
effect of animal n and is composed of the additive genetic 
contribution of sire and dam breeding values and a 
5 Mendelian sampling effect; and is a random residual. It 
was found that eleven of seventeen polymorphisms tested 
influenced at least one production trait. Bovines having 
the best polymorphic forms for milk production at these 
eleven loci are used as parents for breeding the next 
10 generation of the herd. 

D. Genetic Mapping of Phenotypic Traits 

The previous section concex-ns ideiiLilying correlations 
between phenotypic traits and polymorphisms that directly 
or indirectly contribute to those traits. The present 

15 section describes identification of a physical linkage 

between a genetic locus associated with a trait of interest 
and polymorphic markers that are not associated with the 
trait, but are in physical proximity with the genetic locus 
responsible for the trait and co-segregate with it. Such 

20 analysis is useful for mapping a genetic locus associated 
with a phenotypic trait to a chromosomal position, and 
thereby cloning gene(s) responsible for the trait. See 
Lander et ai . , Proc . Natl. Acad, Sci, (USA) 83, 7353-7357 

(1986) ; Lander et al . , Proc, Natl, Acad. Sci. (USA) 84, 
25- 2363-2367 (1987); Donis-Keller et al . , Cell 51, 319-337 

(1987) ; Lander et al . , Genetics 121, 185-199 (1989)). 
Genes localized by linkage can be cloned by a process known 
as directional cloning. See Wainwright, Med. J, Australia 
159, 170-174 (1993) ; Collins, Nature Genetics 1, 3-6 

30 (1992) . 

Linkage studies are typically performed on members of a 
family. Available members of the family are characterized 
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for the presence or absence of a phenotypic trait and for a 
set of polymorphic markers. The distribution of 
polymorphic markers in an informative meiosis is then 
analyzed to determine which polymorphic markers co- 
5 segregate with a phenotypic trait. See, e.g., Kerem et 
al . , Science 245, 1073-1080 (1989); Monaco et ai . , Nature 
316, 842 (1985); Yamoka et ai . , Neurology 40, 222-226 
(1990); Rossiter et al . , FASEB Journal 5, 21-27 (1991). 

Linkage is analyzed by calculation of LCD (log of the 

10 odds) values. A lod value is the relative likelihood of 
obtaining observed segregation data for a marker and a 
genetic locus when the two are located at a -recombination 
fraction 6, versus the situation in which the two ax-e not 
linked, and thus segregating independently (Thompson & 

15 Thompson, Genetics in Medicine (5th ed, W.B. Saunders 

Company, Philadelphia, 1991) ; Strachan, "Mapping the human 
genome" in The Human Genome (BIOS Scientific Publishers' 
Ltd, Oxford), Chapter 4). A series of likelihood ratios 
are calculated at various recombination fractions {6) , 

20 ranging from 6 = 0.0 (coincident loci) to ^9 = 0.50 

(unlinked). Thus, the likelihood at a given value of 6 is: 
probability of data if loci linked at 6^ to probability of 
data if loci unlinked. The computed likelihoods are 
usually expressed .as the log^o of this ratio (i.e., a lod 

25 score). For example, a lod score of 3 indicates 1000:1 
odds against an apparent observed linkage being a 
coincidence. The use of logarithms- allows data collected 
from different families to be combined by simple addition. 
Computer programs are available for the calculation of lod 

30 scores for differing values of 6 (e.g., LIPED, MLINK 
(Lathrop, Proc. Nat. Acad. Sci . (USA) 81, 3443-3446 
(1984)) , For any particular lod score, a recombination 
fraction may be determined from mathematical tables. 5ee 
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Smith et al , , Mathematical tables for research workers in 
human genetics (Churchill, London, 1961); Smith, Ann. Hum. 
Genet. 32, 127-150 (1968) . The value of d at which the lod 
score is the highest is considered to be the best estimate 
5 of the recombination fraction. 

Positive lod score values suggest that the two loci are 
linked, whereas negative values suggest that linkage is 
less likely (at that value of 6) than the possibility that 
the two loci are unlinked. By convention, a combined lod 

10 score of +3 or greater (equivalent to greater than 1000:1 

odds in favor of linkage) is considered definitive evidence 
that two loci are linked. Similarly, by convention, a 
negative lod score of -2 or less is taken as definitive 
evidence against linkage of the two loci being compared. 

15 Negative linkage data are useful in excluding a chromosome 
or a segment thereof from consideration. The search 
focuses on the remaining non-excluded chromosomal 
locations . 

IV. Modified Polypeptides and Gene Sequences 
20 The invention further provides variant forms of nucleic 

acids and corresponding proteins. The nucleic acids 
comprise one of the sequences described in the Table, 
column 8, in which the polymorphic position is occupied by 
one of the alternative bases for that position. Some 
25 nucleic acids encode full-length variant forms of proteins. 
Similarly, variant proteins have the prototypical amino 
acid sequences encoded by nucleic acid sequences shown in 
the Table, column 8, (read so as to be in-frame with the 
full-length coding sequence of which it is a component) 
3 0 except at an amino acid encoded by a codon including one of 
the polymorphic positions shown in the Table. That 
position is occupied by the amino acid coded by the 
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corresponding codon in any of the alternative forms shown 
in the Table. 

Variant qenes can be expressed in an expression vector 
in which a variant gene is operably linked to a native or 
5 other promoter. Usually, the promoter is a eukaryotic 
promoter for expression in a mammalian cell. The 
transcription regulation sequences typically include a 
heterologous promoter and optionally an enhancer which is 
recognized by the host. The selection of an appropriate 

10 promoter, for example trp, lac, phage promoters, glycolytic 
enzyme promoters and tRNA promoters, depends on the host 
selected. Commercially available expression vectors can be 
used. Vectors can include host - recognized replication 
systems, amplifiable genes, selectable markers, host 

15 sequences useful for insertion into the host genome, and 
the like. 

The means pf introducing the expression construct into 
a host cell varies depending upon the particular 
construction and the target host. Suitable means include 

20 fusion, conjugation, transf ection, transduction, 

electroporation or injection, as described in Sambrook, 
supra, A wide variety of host cells can be employed for 
expression of the variant gene, both prokaryotic and 
eukaryotic. Suitable host cells include bacteria such as 

25 K. coii, yeast, filamentous fungi, insect cells, mammalian 
cells, typically immortalized, e.gr., mouse, CHO, human and 
monkey cell lines and derivatives thereof. Preferred host 
cells are able to process the variant gene product to 
produce an appropriate mature polypeptide. Processing 

30 includes glycosylation, ubiquit ination, disulfide bond 

formation, general post- translational modification, and the 
like. 
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The protein may be isolated by conventional means of 
protein biochemistry and purification to obtain a 
substantially pure product, i.e., 80, 95 or 99% free of 
cell component contaminants, as described in Jacoby, 
5 Methods in Enzymology Volume 104, Academic Press, New York 
(1984); Scopes, Pjrotein Purification, Principles and 
Practice, 2nd Edition, Springer- Verlag , New York (1987); 
and DeuLdcher (ed) , Guide to Protein Purification, Methods 
in Enzymologyr Vol. 182 (1990). If the protein is 

10 secreted, it can be isolated from the supernatant in which 
the host cell is grown. If not secreted, the protein can 
be isolated from a lysate of the host cells.- 

The invention further provides transgenic nonhuman 
animals capable of expressing an exogenous variant gene 

15 and/or having one or both alleles of an endogenous variant 
gene inactivated. Expression of an exogenous variant gene 
is usually achieved by operably linking the gene to a 
promoter and optionally an enhancer, and microinj ect ing the 
construct into a zygote. See Hogan et ai . , "Manipulating 

2 0 the Mouse Embryo, A Laboratory Manual," Cold Spring Harbor 
Laboratory. Inactivation of endogenous variant genes can 
be achieved by forming a transgene in which a cloned 
variant gene is inactivated by insertion of a positive 
selection marker. See Capecchi, Science 244, 1288-1292 

25 (1989) . The transgene is then introduced into an embryonic 
stem cell, where it undergoes homologous recombination with 
an endogenous variant gene. Mice and other rodents are 
preferred animals. Such animals provide useful drug 
screening systems . 

30 In addition to substantially full-length polypeptides 

expressed by variant genes, the present invention includes 
biologically active fragments of the polypeptides, or 
analogs thereof, including organic molecules which simulate 
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the interactions of the peptides. Biologically active 
fragments include any portion of the full-length 
polypeptide which confers a biological function on the 
variant gene product, including ligand binding, and 
5 antibody binding. Ligand binding includes binding by 

nucleic acids, proteins or polypeptides, small biologicall 
active molecules, or large cellular structures. 

Polyclonal and/or monoclonal antibodies that 
specifically bind to variant gene products but not to 
10 corresponding prototypical gene products are also provided 
Antibodies can be made by injecting mice or other animals 
wit-h the variant gene product or synthetic peptide- 
fragments thereof. Monoclonal antibodies are screened as 
are described, for example, in Harlow & Lane, Antibodies, 
15 Laboratory Manual, Cold Spring Harbor Press, New York 
(1988) ; Coding, Monoclonal antibodies, Principles and 
Practice {2d ed.) Academic Press, New York (1986). 
Monoclonal antibodies are tested for specific 
immunoreactivity with a variant gene product and lack of 
20 immunoreactivity to the corresponding prototypical gene 

product. These antibodies are useful in diagnostic assays 
for detection of the variant form, or as an active 
ingredient in a pharmaceutical composition. 

V. Kits 

25 The invention further provides kits comprising at leas 

one allele-specific oligonucleotide as described above. 
Often, the kits contain one or more pairs of allele- 
specific oligonucleotides hybridizing to different forms c 
a polymorphism. In some kits, the allele-specific 

30 oligonucleotides are provided immobilized to a substrate. 
For example, the same substrate can comprise allele- 
specific oligonucleotide probes for detecting at least 10. 
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100 or all of the polymorphisms shown in the Table. 
Optional additional components of the kit include, for 
example, restr i cf. ion enzymes, reverse- transcriptase or 
polymerase, the substrate nucleoside triphosphates, means 
used to label (for example, an avidin-enzyme conjugate and 
enzyme substrate and chromogen if the label is biotin) , and 
the appropriate buffers for reverse transcription, PGR, or 
hybridization reactions. Usually, the kit also contains 
instructions for carrying out the methods. 

The following Examples are offered for the purpose of 
illustrating the present invention and are not to be 
construed to limit the scope of this invention-.- The 
teachings of all references cited herein are hereby 
incorporated herein by reference. 



15 



20 



EXAMPLES 

The polymorphisms shown in the Table were identified by 
resequencing of target sequences from three to ten 
unrelated individuals of diverse ethnic and geographic 
backgrounds by hybridization to probes immobilized to 
microfabricated arrays or conventional sequencing. The 
strategy and principles for design and use of such arrays 
are generally described in WO 95/11995. The strategy 
provides arrays of probes for analysis of target sequences 
showing a high degree of sequence identity to the reference 
25 sequences of the fragments shown in the Table, column 1. 
The reference sequences were sequence -tagged sites (STSs) 
developed in the course of the Human Genome Project (see, 
e.g., Science 270, 1945-1954 (1995); Nature 380, 152-154 
(1996)). Most STS's ranged from 100 bp to 300 bp in size. 

A typical probe array used in this analysis has two 
groups of four sets of probes that respectively tile both 
strands of a reference sequence. A first probe set 



30 
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comprises a plurality of probes exhibiting perfect 
complementarily with one of the reference sequences. Each 
probe in the first probe set has an interrogation position 
that corresponds to a nucleotide in the reference sequence. 
5 That is, the interrogation position is aligned with the 
corresponding nucleotide in the reference sequence, when 
the probe and reference sequence are aligned to maximize 
complementarily between the two. For each probe in the 
first set, there are three corresponding probes from three 

10 additional probe sets. Thus, there are four probes 

corresponding to each nucleotide in the reference sequence. 
The probes from the three additional probe -set-s a2?e 
identical to the corresponding pr-obe from the first probe 
set except at the interrogation position, which occurs in 

15 the sam.e position in each of the four corresponding, probes 
from the four probe sets, and is occupied by a different 
•nucleotide in the four probe sets. In the present 
analysis, probes were 25 nucleotides long. Arrays tiled 
for multiple different references sequences were included 

20 on the same substrate. 

Multiple target sequences from an individual were 
amplified from human genomic DNA using primers for the 
fragments indicated in the listed Web sites. The amplified 
target sequences were f luorescently labelled during or 

25 after PGR. The labelled target sequences were hybridized 

with a substrate bearing immobilized arrays of probes. The 
amount of lable bound to probes was measured. Analysis of 
the pattern of label revealed the nature and position of 
differences between the target and reference sequence. For 

30 example, comparison of the intensities of four 
corresponding probes reveals the identity of a 
corresponding nucleotide in the target sequences aligned 
with the interrogation position of the probes. The 
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corresponding nucleotide is the complement of the 
nucleotide occupying the interrogation position of the 
probe showing the highest intensity (see WO 95/11995) . The 
existence of a polymorphism is also manifested by 
5 differences in normalized hybridization intensities of 

probes flanking the polymorphism when the probes hybridized 
to corresponding targets from different individuals. For 
example, relative loss of hybridization intensity in a 
"footprint" of probes flanking a polymorphism signals a 

10 difference between the target and reference (i.e., a 
polymiorphism) (see EP 717,113). Additionally, 
, hybridization intensities for corresponding ta^rgebs from 
different individuals can be classified into groups or 
clusters suggested by the data, not defined a priori, such 

15 that isolates in a give cluster tend no be similar and 
isolates in different clusters tend to be dissimilar. 
Hybridizations to samples from different individuals were 
performed separately. The Table summarizes the data 
obtained for target sequences in comparison with a 

20 reference sequence for the individuals tested. 

From the foregoing, it is apparent that the invention 
includes a number of general uses that can be expressed 
concisely as follows. The invention provides for the use 
of any of the nucleic acid segments described above in the 

25 diagnosis or monitoring of diseases, such as cancer, 
inflammation, heart disease, diseases of the CNS, and 
susceptibility to infection by microorganisms. The 
invention further provides for the use of any of the 
nucleic acid segments in the manufacture of a medicament 

30 for the treatment or prophylaxis of such diseases. The 
invention further provides for the use of any of the DNA 
segments as a pharmaceutical. 
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All publications and patent applications cited above 
are incorporated by reference in their entirety for all 
purposes to the same extent as if each individual 
publication or patent application were specifically and 
individually indicated to be so incorporated by reference 
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EQUIVALENTS 

While this invention has been particularly shown rind 
described with references to preferred embodiments thereof 
it will be understood by those skilled in the art chat 
5 various changes in form and details may be made therein 
without departing from the spirit and scope of the 
invention as defined by the appended claims. Those skilie 
in the art will recognize or be able to ascertain using no 
more than routine experimentat ion, many equivalents to the 
10 - specific embodiments of Che invention described 

specifically herein. Such equivalents are intended to be 
encompassed in the scope of the claims. 
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CLAIMS 

WE CLAIM: 

1. A nucleic acid segment shown in column 7 of the Table, 
or a portion thereof which includes a polymorphic site, 

5 or the complement of the segment or portion thereof. 

2. The nucleic acid segment of. claim 1 that is DNA. 

3. The nucleic acid segment of claim 1 that is RNA. 

4. The segment of claim 1 that is less than 100 bases. 

5. The segment of claim 1 that is less than 50 bases. 
10 6. The segment of claim 1 that is less than 20 bases. 

7. The segment of claim 1, wherein the polymorphic site is 
biallelic. 

8. The segment of. claim 1, wherein the polymorphic form 
occupying the polymorphic site is the reference base 

15 for the fragment listed in the Table, column 3. 

9. The segment of claim 1, wherein the polymorphic form 
occupying the polymorphic site is an alternative form 
for the fragment listed in the Table, column 4. 

10. An allele-specif ic oligonucleotide that hybridizes to a 
2 0 segment of a fragment shown in the Table, column 7 or 

its complement - 



11. The allele-specif ic oligonucleotide of claim 10 that is 
a probe . 
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12. The allele-specif ic oligonucleotide of claim lO, 
wherein a centi'al position of the probe aligns with the 
polymorphic site of the fragment. 

13. The allele-specif ic oligonucleotide of claim 10 that is 
5 a primer. 

14. The allele-specif ic oligonucleotide of claim 13, 
wherein the 3' end of the primer aligns with the 
polymorphic site of the f raoment . 



10 



15. The allele-specif ic oligonucleotide of Claim 10, which 
is selected from the group- consisting of the nucleotide 
sequences of the Table, column 5. 



16. The allele-specif ic oligonucleotide of Claim 10, which 
is selected from the group consisting of the nucleotide 
• sequences of the Table, column 6. 



7. An isolated nucleic acid comprising a sequence of the 
Table, column 7 or the complement thereof, wherein the 
pol^m^orphic site within the sequence or complement is 
occupied by a base other than the reference base shown 
in the Table, column 3. 



. A method of analyzing a nucleic acid, comprising 
obtaining the nucleic acid from an individual; and 
determining a base occupying any one of the polymorphic 
sites shown .in the Table. 

. The method of claim 18, wherein the determining 

comprises determining a set of bases occupying a set of 
the polymorphic sites shown in the Table. 
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20. The method of claim 10, wherein the nucleic acid is 
obtained from a plurality of indi vidua 1 i=; , and a base 
occupying one of the polymorphic positions is 
determined in each of the individuals, and the method 
5 further comprising testing each individual for the 

presence of a disease phenotype, and correlating the 
presence of the disease phenotype with the base . 
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CROSS-REFERENCE TO RELATED APPLICATIONS 
The present application derives priority from USSN 
60/049,612 filed June 13, 1997, which is incorporated by 
10 reference in its entirety for all purposes. 

BACKGROUND 

The genomes of all organisms undergo spontaneous 
mutation in the course of their continuing evolution 

15 generating variant forms of progenitor sequences (Gusella, 

Ann. Rev. Biochem, 55, 831-854 (1986)), The variant form may 
confer an evolutionary advantage or disadvantage relative to a 
progenitor form or may be neutral. In some instances/ a 
variant form confers a lethal disadvantage and is not 

20 transmitted to subsequent generations of the organism. In 
other instances, a variant form confers an evolutionary 
advantage to the species and is eventually incorporated into 
the DNA of many or most members of the species and effectively 
becomes the progenitor form. In many instances, both 

25 progenitor and variant form(s) survive and co-exist in a 

species population. The coexistence of multiple forms of a 
sequence gives rise to polymorphisms. 

Several different types of polymorphism have been 
reported. A restriction fragment length polymorphism (RFLP) 

30 means a. variation in DNA sequence that alters the length of a 
restriction fragment as described in Botstein et al., Am. J. 
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Hum. Genet. 32, 314-331 (1980). Other polymorphisms take the 
form of short tandem repeats (STRs) that include tandem di-, 
tri- and tetra-nucleotide repeated motifs. Some polymorphisms 
take the form of single nucleotide variations between 
5 individuals of the same species. Such polymorphisms are far 
more frequent than RFLPs, STRs and VNTRs . Single nucleotide 
polymorphisms can occur anywhere in protein- coding sequences, 
intronic sequences, regulatory sequences, or intergenomic 
regions . 

10 Many polymorphisms probably have little or no 

phenotypic effect. Some polymorphisms, principally those 
occurring within coding sequences, are known to be the direct 
cause of serious genetic diseases, such as sickle cell anemia. 
Polymorphisms occurring within a coding sequence typically 

15 exert their phenotypic effect by leading to a truncated or 
altered expression product. Still other polymorphisms, 
particularly those in promoter regions and other regulatory 
sequences, may influence a range of disease-susceptibility, 
behavioral and other phenotypic traits through their effect on 

20 gene expression levels. That is, such polymorphisms may. lead 
to increased or decreased levels of gene expression without 
necessarily affecting the. nature of the expression product. 

SUMMARY OF THE INVENTION 
25 The invention provides methods of monitoring 

expression levels of different polymorphic forms of a gene. 
Such methods entail analyzing genomic DNA from an individual 
to determine the presence of heterozygous polymorphic forms at 
a polymorphic site within a transcribed sequence of a gene of 
30 interest. RNA from a tissue of the individual in which the 
gene is expressed is then analyzed to determine relative 
proportions of polymorphic forms in transcript of the qene . 



wo 98/56954 



PCT/US98/12442 



3 

In some methods, genomic DNA is analyzed by 
amplifying a segment of genomic DNA from a sample and 
hybridizing the amplified genomic DNA to an array of 
immobilized probes. In some methods the array used for 
analyzing genomic DNA comprises a first probe group comprising 
one or more probes exactly complementary to a first 
polymorphic form of the gene and a second probe group 
comprising one or more probes exactly complementary to a 
second polymorphic form of the gene. In some methods, RNA is 
analyzed by reverse transcribing and amplifying mRNA expressed 
from the gene to produce an amplified nucleic acid and 
hybridizing the amplified nucleic acid to an array of 
immobilized probes. In some such methods, the amplified 
nucleic acid is cDNA. In some methods, the array of 
immobilized probes for analyzing RNA comprises a first probe 
group comprising one or more probes exactly complementary to a 
first polymorphic form of the gene, a second probe group 
comprising one or more probes exactly complementary to a 
second polymorphic form of the gene. 

In some method, genomic DNA and the RNA are analyzed 
by hybridizing the genomic DNA or an amplification product 
thereof, and the RNA or an amplification product thereof, to 
the same array of immobilized probes comprising a first probe 
group comprising one or more probes exactly complementary to a 
first polymorphic form of the gene, and a second probe group 
comprising one or more probes exactly complementary to a 
second polymorphic form of the gene. 

In some methods, the genomic DNA, or amplification product, 
and the RNA, or amplification product, bear different labels 
and are hybridized simultaneously to the array. 

Some methods further comprise comparing a genomic 
DNA hybridization intensity of the first probe group to the 
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subset to identify a further polymorphism in a promoter, 
enhancer or intronic sequence of the gene. 



DEFINITIONS 
A nucleic acid is a deoxyribonucleotide or 
ribonucleotide polymer in either single-or double -stranded 
form, including known analogs of natural nucleotides unless 
otherwise indicated. 

An oligonucleotide is a single- stranded nucleic acid 
ranging in length from 2 to about 500 bases. Oligonucleotides 
are often synthetic but can also be produced from naturally 
occurring polynucleotides. 

.A probe is an oligonucleotide capable of binding to 
a target nucleic acid of complementary sequence through one or 
more types of chemical bonds, usually through complementary 
base pairing, usually through hydrogen bond formation. 
Oligonucleotides probes are often 10-50 or 15-30 bases long. 
An oligonucleotide probe may include natural (i.e. A, G, C, or 
0 T) or modified bases (7-deazaguanosine, inosine, etc.) . In 

addition, the bases in oligonucleotide probe may be joined by 
a linkage other than a phosphodiester bond, so long as it does 
not interfere with hybridization. Thus, oligonucleotide 
probes may be peptide nucleic acids in which the constituent 
5 bases are joined by peptide bonds rather than phosphodiester 
linkages. 

Specific hybridization refers to the binding, 
duplexing, or hybridizing of a molecule only to a particular 
nucleotide sequence under stringent conditions when that 
0 sequence is present in a complex mixture (e.g., total 

cellular) DNA or RNA. Stringent conditions are conditions 
under which a probe will hybridize to its target subsequence, 
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but to no other sequences. Stringent conditions are sequence- 
dependent and are different in different circumstances. 
Longer sequences hybridize specifically at higher 
temperatures. Generally, stringent conditions are selected to 
be about 5°C lower than the thermal melting point (Tm) for the 
specific sequence at a defined ionic strength and pH. The Tm 
is the temperature (under defined ionic strength, pH, and 
nucleic acid concentration) at which 50% of the probes 
complementary to the target sequence hybridize to the target 
sequence at equilibrium. (As the target sequences are 
generally present in excess, at Tm, 50% of the probes are 
occupied at equilibrium) . Typically, stringent conditions 
include a salt concentration of at least about 0.01 to 1.0 M 
Na ion concentration (or other salts) at pH 7.0 to 8.3 and the 
temperature is at least about 30°C for short probes (e.g., 10 
to 50 nucleotides) . Stringent conditions can also be achieved 
with the addition of destabilizing agents such as formamide. 
For example, conditions of 5X SSPE (750 mM NaCl, 50 mM 
NaPhosphate, 5 mM EDTA, pH 7.4) and a temperature of 25-30*^0 
are suitable for allele-specif ic probe hybridizations. 

A perfectly matched probe has a sequence perfectly 
complementary to a particular target sequence. The test probe 
is typically perfectly complementary to a portion 
(subsequence) of the target sequence. The term "mismatch 
■ probe" refer to probes whose sequence is deliberately selected 
not to be perfectly complementary to a particular target 
sequence. Although the mismatch(s) may be located anywhere in 
the mismatch probe, terminal mismatches are less desirable as 
a terminal mismatch is less likely to prevent hybridization of 
the target sequence. Thus, probes are often designed to have 
the mismatch located at or near the center of the probe such 
that the mismatch is most likely to destabilize the duplex 
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with the target sequence under the test hybridization 
conditions . 

Transcriptions levels can be quantified absolutely 
or relatively. Absolute quantification can be accomplished by 
inclusion of known concentration (s) of one or more target 
nucleic acids (e.g. control -nucleic acids such as Bio B or 
with known amounts the target nucleic acids themselves) and 
referencing the hybridization intensity of unknowns with the 
known target nucleic acids (e.g. through generation of a 
standard curve) . Alternatively, relative quantification can 
be accomplished by comparison of hybridization signals between 
two or more polymorphic forms of a transcript. 

A polymorphic marker or site is the locus at which 
divergence occurs. Preferred markers -have at least two 
alleles, each occurring at frequency of greater than 1%, and 
more preferably greater than 10% or 20% of a selected 
population. A polymorphic locus may be as small as one base 
pair. Polymorphic markers include restriction fragment length 
polymorphisms, variable number of tandem repeats (VNTR's), 
0 hypervariable regions, minisatellites, dinucleotide repeats, 
trinucleotide repeats, tetranucleotide repeats, simple 
sequence repeats, and insertion elements such as Alu. The 
first identified allelic form is arbitrarily designated as a 
the reference form and other allelic forms are designated as 
5 alternative or variant alleles. The allelic form occurring 

most frequently in a selected population is sometimes referred 
to as the wildtype form. Diploid organisms may be homozygous 
or heterozygous for allelic forms. A diallelic polymorphism 
has two forms. A triallelic polymorphism has three forms. 
0 A single nucleotide polymorphism (SNP) occurs at a 

polymorphic site occupied by a single nucleotide, which is the 
site of variation between allelic sequences. The site is 
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usually preceded by and followed by highly conseorved sequences 
of the allele (e.g., sequences that vary in less than 1/100 or 
1/1000 members of the populations) . 

A single nucleotide polymorphism usually arises due 
5 to substitution of one nucleotide for another at the 

polymorphic site. A transition is the replacement of one 
purine by another purine or one pyrimidine by another 
pyrimidine . A transversion is the replacement of a purine by 
a pyrimidine or vice versa. Single nucleotide polymorphisms 
10 can also arise from a deletion of a nucleotide or an insertion 
of a nucleotide relative to a reference allele. 

DESCRIPTION 

I . General 

15 A substantial number of polymorphic sites in humans 

and other species have been described in the published 
literature, and many other polymorphic sites in human genomic 
DNA are described in commonly owned copending patent 
applications, such as PCT/US98/04571 , filed March 5, 1998 

20 (incorporated by reference in their entirety for all 

purposes) . The genomic locations of these sites are known, as 
is the nature of the polymorphic forms occurring at the sites. 
Many of the known polymorphic sites occur within so-called 
expressed sequence tags and are therefore represented in the 

25 transcript of genomic DNA, as well as genomic DNA itself. The 
present invention uses polymorphisms within the transcribed 
region of a gene as a means to monitor the relative expression 
of different allelic forms of the gene. Having identified 
alleles of a gene that are expressed at different levels, the 

30 alleles can be further analyzed to locate a second 

polymorphism that has a causative role in the different 
expression levels. Often, the causative polymorphism is foxind 



wo 98/56954 



PCT/US98/12442 



9 

outside the coding sequence of a gene; for example, in a 
promoter, other regulatory sequence or an intronic sequence. 

In the present methods, nucleic acid samples from 
individuals are characterized at both the genomic and 
transcriptional levels. The genomic analysis screens genomic 
DNA from an individual to identify one or more genes that are 
heterozygous for a polymorphism occurring within a transcribed 
region of a gene. RNA from the individual is then analyzed to 
determine the relative levels of polymorphic forms in the 
transcript of the heterozygous genes identified by the genomic 
analysis. If the levels of polymorphic forms in the 
transcript of a gene differ significantly from each other, 
further analysis is performed to identify the cause of the 
different levels. . It is possible that the polymorphism within 
the transcript that is used for monitoring expression levels 
may itself affect expression levels. However, it is more 
likely that the difference in expression levels stems from 
another polymorphic difference between the alleles. Such 
polymorphisms are particularly likely to reside in promoter 
sequences, enhancers, intronic splice sites, or other 
regulatory sequences. 

II. Analvzino Polymorphic Forms at the Genomic L ^vqX 

Strategies for identification and detection of 
polymorphisms are described in commonly owned USSN 08/831,159, 
EP 730,663, EP 717,113, and PCT US97/02102, filed February 7, 
1997 (incorporated by reference in their entirety for all 
purposes) . The present methods usually employ 
precharacterized polymorphisms. That is, the genotyping 
required by the present methods is usually performed after the 
location and nature of polymorphic forms present at a site 
have already been determined. The availability of this 
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information allows sets of probes to be designed for specific 
identification of the known polymorphic forms. 

In the simplest form of analysis, a biallelic 
polymorphism forms can be characterized using a pair of allele 
5 specific probes respectively hybridizing to the two 

polymorphic forms. However, analysis is more accurate using 
specialized arrays of probes tiled based on the respective 
polymorphic forms. Tiling refers to the use of groups of 
related immobilized probes, some of which show perfect 

10 complementarity to a reference sequence and others of which 

show mismatches from the reference sequence (see EP 730,663) . 
A typically array for analyzing a known biallelic single 
nucleotide polymorphism contains two group of probes tiled 
based on two reference, sequences constituting the respective 

15 polymorphic forms. 

The first group of probes includes at least a first 
set of one or more probes which span the polymorphic site and 
are exactly complementary to one of the polymorphic forms. 
The group of probes can also contain second, third and fourth 

20 additional sets of probes, which contain probes identical to 
probes in the first probe set except at one position referred 
to as an interrogation position. When such a probe group is 
hybridized with the polymorphic form constituting the 
reference sequence, all probes in the first probe show perfect 

25 hybridization and all of the probes in the other probe sets 
show background hybridization levels due to mismatches. 

When such a probe group is hybridized with the other 
polymorphic form, a different pattern is obtained. That is, 
all but one probes in the array show a mismatch to the target 

3 0 and produce only background hybridization. The one probe that 
shows perfect hybridization is a probe from the second, third 
or fourth probe sets whose interrogation position aligns with 
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the polymorphic site and is occupied by a base complementary 
to the other polymorphic form. 

When the probe group is hybridized with a ' 
heterozygous sample in which both polymorphic forms are 
present, the patterns for the homozygous polymorphic forms are 
superimposed. Thus, the probe group shows distinct and 
characteristic hybridization patterns depending on which 
polymoirphic forms are present and whether an individual is 
homozygous or heterozygous. 

Typically, an array also contains a second group of 
probes tiled using the same principles as the first group but 
with a reference sequence constituting the other polymorphic 
form. That is, the first probe set in the second group spans 
.the polymorphic site and shows perfect complementary to the 
other polymorphic form. Hybridization of the second probe 
group to homozygous or heterozygous target sequences yields a 
mirror image of hybridization patterns from the first group. 
By analyzing the hybridization patterns from both probe 
groups, one can determine with a high accuracy which 
polymorphic form(s) are present in an individual. 

The principles of probe selection and array design 
can readily be extended to analyze more complex polymorphisms 
(see EP 730,663). For example, to characterize a triallelic 
SNP polymorphism, three groups of probes can be designed tiled 
on the three polymorphic forms as described above. As a 
further example, to analyze a diallelic polymorphism involving 
a deletion of a nucleotide, one can tile a first group of 
probes based on the undeleted polymorphic form as the 
reference sequence and a second group of probes based on the 
deleted form as the reference sequence. 

Arrays can also be designed to analyze many 
different polymorphisms in mf^nv riiffRrent a^riR.c; nimul taneouslv 
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simply by including multiple subarrays of probes. Each 
subarray has first and second groups of probes designed for 
analyzing a particular polymorphism according to the strategy 
described above. 

5 For assay of genomic DNA, virtually any biological 

sample (other than pure red blood cells) is suitable. For 
example, convenient tissue samples include whole blood, semen, 
saliva, tears, urine, fecal material, sweat, buccal, skin and 
hair. Genomic DNA is typically amplified before analysis. 

10 Amplification is usually effected by PGR using primers 

flanking a suitable fragment e.g., of 50-^00 nucleotides 
containing the locus of the polymorphism to be analyzed. The 
target is usually labelled in the course of amplif ication. 
The amplification product can be RNA or DNA, single stranded 

15 or double stranded. If double stranded, the amplification 
product is typically denatured before application to an 
arrray. If genomic DNA is analyzed without amplification, it 
may be desirable to remove RNA from the sample before applying 
it to the array. Such can be accomplished by digestion with 

20 DNase-free RNase, 

I II ' Expression Monitoring 

The invention monitors the levels of RNA transcripts 

expressed from genes of interest. The RNA transcript can be 
25 nuclear RNA, mRNA, rRNA or tRNA. Nuclear RNA contains 

intronic sequences that have been spliced out of mRNA. 

Analysis of nuclear RNA can be useful in analyzing the effects 

on expression of polymorphisms occurring within intronic 

regions. In some methods, RNA is monitored directly and in 
30 other methods RNA is monitored indirectly via an amplification 

product, such as cDNA or cRNA. 
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Strategies for analysis and quantification of 
transcript are described in detail in commonly owned WO 
96/14839 and WO 97/01603. In general, the same probe arrays 
that are used for analyzing polymorphic forms in genomic DNA 
5 can be used for analyzing polymorphic forms of transcript. 

The hybridization patterns of the probe arrays can be analyzed 
in the same manner for genomic and RNA (or RNA- derived) 
targets. Comparison of the hybridization intensities of the 
first probe group that are perfectly matched with one 

10 polymorphic form to the hybridization intensities of the 

second probe group that are perfectly matched with the second 
polymorphic form indicates approximately the relative 
proportions of the polymorphic forms in the transcript. 

In some instances, it can be useful to compare the 

15 ratio of hybridization intensities of perfectly matched probes 
from the first and second probe groups for genomic DNA and RNA 
targets (or amplification products thereof) . Preferably, the 
comparison is performed between like forms of amplification 
products (i.e., both DNA or both RNa) , In genomic DNA from a 

20 diploid individual, the polymorphic forms at a heterozygous 
gene are expected to be present in equal molar ratio. 
However, in practice, the ratio of hybridization intensities 
may differ somewhat from the expected molar ratio due to, for 
example, base-composition effects on hybridization intensity. 

25 By comparing the ratios of hybridization intensities for 

genomic DNA and RNA (or amplification products thereof) to the 
same groups of probes, factors other than molar ratio of 
polymorphic forms that might influence hybridization 
. intensities can largely be eliminated from the analysis. If 

30 the ratio of hybridization intensities differs significantly 
for the genomic and RNA targets (or amplification products 
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thereof) , then it can be concluded that the polymorphic forms 
are differently expressed in the transcript. 

Some arrays contain additional probes for measuring 
the level of transcript of a gene without distinguishing 
5 between the polymorphic forms . These probes exhibit perfect 
complementarity to a segment of the gene distil from the 
polymorphism used to distinguish polymorphic forms. The 
presence and level of the transcript can be inferred from the 
hybridization intensities of these probes, optionally relative 
10 to control probes lacking complementarity to the target and 
designed to measure the background level of hybridization 
intensity. 

RNA transcript for analysis is isolated from a 
biological sample obtained from a biological tissue or fluid 

15 in which the gene of interest is expressed. Samples include 

sputum, blood, blood cells (e.g., white cells), tissue or fine 
needle biopsy samples, urine, peritoneal fluid, and pleural 
fluid, or cells therefrom. Biological samples may also 
include sections of tissues such as frozen sections taken for 

20 histological purposes. 

Methods of isolating total mRNA are described in 
Chapter 3 of Laboratory Techniques in Biochemistry and 
Molecular Biology: Hybridization With Nucleic Acid Probes, 
Part I. Theory and Nucleic Acid Preparation, P. Tijssen, ed. 

25 Elsevier, N.Y. (1993) and Chapter 3 of Laboratory Techniques 
in Biochemistry and Molecular Biology: Hybridization With 
Nucleic Acid Probes, Part I. Theory and Nucleic Acid 
Preparation, P. Tijssen, ed. Elsevier, N.Y. (1993)). 

Frequently, it is desirable to amplify RNA prior to 

30 hybridization. The amplification product can be RNA or DNA, 
single-stranded, or double -stranded. In one procedure, mRNA 
can be reverse transcribed with a reverse transcriptase and a 
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primer consisting of oligo dT and a sequence encoding the 
phage T7 promoter to provide single stranded DNA template. 
The second DNA strand is polymerized using a DNA polymerase. 
After synthesis of double- stranded cDNA, T7 RNA polymerase is 
5 added and RNA is transcribed from the cDNA template. 

Successive rounds of transcription from each single cDNA 
template result in amplified RNA. Alternatively, cDNA can be 
amplified to generate double stranded amplicon, and one strand 
of the amplicon can be isolated, i.e., using a biotinylated 

10 primer that allows capture of the undesired strand on 

streptavidin beads. Alternatively, asymmetric PGR can be used 
to generate a single-stranded target. 

Typically, amplification product is labelled either 
in the course of amplification or subsequently. If RNA 

15 amplification product is to be hybridized simultaneously with 
genomic DNA, or an amplification product thereof, to an array, 
then the two targets are differentially labelled. A variety 
of different fluorescent labels are available. For example, 
one sample can be labelled with fluorescein and the other with 

20 biotin, which can be stained with phycoerythrinstreptavidin 
after hybridization. Two target samples can be diluted, if 
desired, prior to hybridization to equalize fluorescence 
intensities . 

Detailed protocols for PGR are provided in PCR 

25 Protocols, A Guide to Methods and Applications, Innis et al.. 
Academic Press, Inc. N.Y., {1990), Other suitable 
amplification methods include the ligase chain reaction (LCR) 
(see Wu and Wallace, Genomics, 4: 560 (1989), Landegren, et 
al., Science, 241: 1077 (1988) and Barringer, etal.. Gene, 

30 89: 117 (1990), transcription amplification (Kwoh, et al . , 
Proc, Natl. Acad. Sci . USA, 86: 1173 (1989)), and self- 
sustained sequence replication (Guatelli, et al., Proc. Nat. 
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Acad, Sci. USA, 87: 1874 (1990)). In some methods, a known 
quantity of a control sequence is co-amplified using the same 
primers to provide an internal standard that may be used to 
calibrate the PGR reaction to ensure that the amplification 
products are produced in approximately the same molar ratio as 
the starting ratio of templates. The probe array then 
includes probes specific to the internal standard for 
quantification of the amplified nucleic acid. 

TV. Correlation of Genotype with Expression Levels 

Having identified alleles of a gene that are 
expressed at different levels, the alleles can be further 
analyzed to identify a difference between them that accounts 
for the different expression levels. The difference may 
reside in the same polymorphism that was used to distinguish 
the different allelic forms in the analyses described above - 
However, more typically, the difference in expression levels 
resides in a second polymorphism located in a promoter, 
enhancer or other regulatory regions. Such polymorphisms can 
be identified by sequencing the regulatory regions of the 
differentially expressed alleles and identifying sequence 
differences between the alleles, 

A possible causative role of a polymorphism within a 
regulatory sequence in differential expression of alleles can 
be analyzed by both molecular biological and genetic 
approaches. For example, if differentially expressed alleles 
differ from each other at a polymorphic site within a 
promoter, the different forms of the promoter can be cloned 
and placed in operable linkage with a reporter gene. If the 
reporter gene is expressed at different levels from the two 
forms of the promoter, it is likely that the polymorphism 
within the promoter has a causative role in the observed 
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differential expression levels of allelic forms of the gene 
with which it is naturally associated. Similar reporter 
assays can be devised to assess the effect of polymorphisms in 
other regulatory sequences. 

Polymorphisms within promoters and other regulatory 
sequences can also be characterized by association analysis. 
Association analysis identifies correlations between 
polymorphic forms and a population of individuals who have 
been tested for the presence or absence of a phenotypic trait 
of interest and for polymorphic markers sets. To perform such 
analysis, the presence or absence of a polymorphism is 
determined for a set of the individuals, some of whom exhibit 
a particular trait, and some of which exhibit lack of the 
trait. The alleles of the polymorphism are then reviewed to 
determine whether the presence or absence of a particular 
allele is associated with the. trait of interest. Correlation 
can be performed by standard statistical methods such as a k- 
squared test and statistically significant correlations 
between a polymorphic form and phenotypic characteristics are 
noted, 

V, A:iternative Hethoc^ of, Corre lating Expyeggioy^ Levels wjt^i 

Genotype 

In an alternative or additional approach, a 
population of individuals is genotyped at one or more 
polymoirphic sites within a gene including flanking sequences. 
Expression levels of the gene transcript are then determined 
in individuals without distinguishing between the polymorphic 
forms. Optionally expression levels from different 
individuals can be classified into groups or clusters 
suggested by the data, not defined a priori, such that 
isolates in a given cluster tend to be similar and isolates in 
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different clusters tend to be dissimilar. See commonly owned 
USSN 08/797,812, filed February 7, 1997 (incorporated by 
reference in its entirety for all purposes) . The population 
of individuals on which the analysis is performed should 
preferably be matched for characteristics that might have 
indirect affects on expression levels such as age, sex and 
ethnicity, and expression levels should be determined from the 
same tissue type. The genotype of an individual with respect 
to one or more polymorphisms within the gene is then 
correlated with the expression level of gene transcript in the 
same individual throughout the population. Polymorphic forms 
showing strong correlation with expression levels of 
transcript may have a causative role in determining the 
expression level. This role can be further investigated using 
the molecular biological and genetic approaches described 
above . 

VI. Association Analysis 

Phenotypic traits suitable for association analysis 
include diseases that have known but hitherto unmapped genetic 
components (e.g., agammaglobulimenia, diabetes insipidus, 
Lesch-Nyhan syndrome, muscular dystrophy, Wiskott-Aldrich 
syndrome, Fabry's disease, familial hypercholesterolemia, 
polycystic kidney disease, hereditary spherocytosis, von 
Willebrand's disease, t\±>erous sclerosis, hereditary 
hemorrhagic telangiectasia, familial colonic polyposis, 
Ehlers-Danlos syndrome, osteogenesis imperfecta, and acute 
intermittent porphyria) . Phenotypic traits also include 
symptoms of, or susceptibility to, multifactorial diseases of 
which a component is or may be genetic, such as autoimmune 
diseases, inflammation, cancer, diseases of the nervous 
system, and infection by pathogenic microorganisms. Some 
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examples of autoiiranune diseases include rheumatoid arthritis, 
multiple sclerosis, diabetes (insulin -dependent and non- 
independent) , systemic lupus erythematosus and Graves disease. 
Some examples of cancers include cancers of the bladder, 
brain, breast, colon, esophagus, kidney, leukemia, liver, 
lung, oral cavity, ovary, pancreas, . prostate, skin, stomach 
and uterus. Phenotypic traits also include characteristics 
such as longevity, appearance (e.g., baldness, obesity), 
strength, speed, endurance, fertility, and susceptibility or 
receptivity to particular drugs or therapeutic treatments. 

Such correlations can be exploited in several ways, 
in the case of a strong correlation between a polymorphic form 
and a disease for which treatment is available, detection of 
the polymorphic . form set in a human or animal patient may 
justify immediate administration of treatment, or at least the 
institution of regular monitoring of the patient. Detection 
of a polymorphic form correlated with serious disease in a 
couple contemplating a family may also be valuable to the 
couple in their reproductive decisions. For example, the 
I female partner might elect to undergo in vitro fertilization 
to avoid the possibility of transmitting such a polymorphism 
from her husband to her offspring. In the case of a weaker, 
but still statistically significant correlation between a 
polymorphic set and human disease, immediate therapeutic 
5 intervention or monitoring may not be justified. 

Nevertheless, the patient can be motivated to begin simple 
life-style changes (e.g., diet, exercise) that can be 
accomplished at little cost to the patient but confer 
potential benefits in reducing the risk of conditions to which 
0 the patient may have increased susceptibility by virtue of 
variant alleles. Identification of a polymorphic set in a 
patient correlated with enhanced receptiveness to .one of 
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several treatment regimes for a disease indicates that this 
treatment regime should be followed. 

VTT. Probe Array Design and Co nstruction 

VLSIPS™ technology provides methods for synthesizing 
arrays of many different oligonucleotide probes that occupy a 
very small surface area. See US 5,143,854 and WO 90/15070. 
For example, high density arrays can be produced which 
comprise greater than about 100, preferably greater than about 
1000, 16,000, 65,000, 250,000 or 1,000,000 different 
oligonucleotide probes. The oligonucleotide probes range from 
about 5 to about 50 or about 5 to about 45 nucleotides, more 
preferably from about 10 to about 40 nucleotides and most 
preferably from about 15 to about 40 nucleotides in length. 
In some embodiments, the oligonucleotide probes are 20 or 25 
nucleotides in length. The oligonucleotide probes are usually 
less than 50 nucleotides in length, generally less than 46 
nucleotides, more generally , less than 41 nucleotides, most 
generally less than 36 nucleotides, preferably less than 31 
nucleotides, more preferably less than 26 nucleotides , and most 
preferably less than 21 nucleotides in length. The probes can 
also be less than 16 nucleotides or less than even 11 
nucleotides in length. 

The location and sequence of each different 
oligonucleotide probe sequence in the array are generally 
known. Moreover, the large number of different probes can 
occupy a relatively small area providing a high density array 
having a probe density of generally greater than about 60, 
100, 600, 1000, 5,000, 10,000, 40,000, 100,000, or 400,000 
different oligonucleotide probes per cm^. The small surface 
area of the array (often less than about 10 cm^, preferably 
less than about 5 cm^ more preferably less than about 2 cm^. 
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and most preferably less than about 1.6 cm^) permits uniform 
hybridization conditions, such as temperature regulation and 
salt content. 

Finally, because of the small area occupied by the 
high density arrays, hybridization may be carried out in 
extremely small fluid volumes (e.g., 250 fil or less, more 
preferably 100 ^ll or less, and most preferably 10 fil or less) . 
In small volumes, hybridization may proceed very rapidly. In 
addition, hybridization conditions are extremely uniform 
throughout the sample, and the hybridization format is 
amenable to automated processing. 

All publications and patent applications cited above 
are incorporated by reference in their entirety for all 
purposes to the- same extent as if each individual publication 
or patent application were specifically and individually 
indicated to be so incorporated by reference. Although the 
present invention has been described in some detail by way of 
illustration and example for purposes of clarity and 
understanding, it will be apparent that certain changes and 
modifications may be practiced within the scope of the 
appended claims. 
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What is claimed is: 

3 1. A method of monitoring expression levels of 

4 different polymorphic forms of a gene, comprising: 

5 analyzing genomic DNA from an individual to 

6 determine the presence of heterozygous polymorphic forms at a 

7 polymorphic site within a transcribed sequence of a gene of 

8 interest; 

9 analyzing RNA from a tissue of the individual in 

10 which the gene is expressed to determine relative proportions 

11 of polymorphic forms in transcript of the gene. 

1 2. The method of claim 1, wherein analyzing genomic 

2 DNA comprises amplifying a segment of genomic DNA from a 

3 sample and hybridizing the amplified genomic DNA to an array 

4 of immobilized probes. 

1 3. The method of claim 2, wherein the array of 

2 immobilized probes comprises a first probe group comprising 

3 one or more probes exactly complementary to a first 

4 polymorphic form of the gene and a second probe group 

5 comprising one or more probes exactly complementary to a 

6 second polymorphic foiro of the gene. 

1 4. The method of claim 1, wherein analyzing the 

2 RNA, comprises reverse transcribing and amplifying mRNA 

3 expressed from the gene to produce an amplified nucleic acid 

4 and hybridizing the amplified nucleic acid to an array of 

5 immobilized probes. 

1 5. The method of claim 4, wherein the amplified 

2 nucleic acid is cDNA. 
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6. The method of claim 4, wherein the array of 
immobilized probes comprises a first probe group comprising 
one or more probes exactly complementary to a first 
polymorphic form of the gene, a second probe group comprising 
one or more probes exactly complementary to a second 



6 polymorphic form of the gene. 



7. The method of claim 1, wherein, the genomic DNA 
and the RNA are analyzed by hybridizing the genomic DNA or an 
amplification product thereof, and the RNA or an amplification 
product thereof, to the same array of immobilized probes 
comprising a first probe group comprising one or more probes 
exactly complementary to a first polymorphic form of the gene, 
and a second probe group comprising one or more probes exactly 
8 complementary to a second polymorphic form of the gene. 



8. The method of claim 7, wherein the genomic DNA, 
or amplification product, and the RNA, or amplification 
product, bear different labels and are hybridized 



4 simultaneously to the array. 



9. The method of claim 7, further comprising 
comparing a genomic DNA hybridization intensity of the first 
probe group to the second group to determine a genomic 
4 hybridization ratio, and comparing an RNA hybridization 

intensity of the first group to the second group to determine 

6 an RNA hybridization ratio, whereby a difference in the 

7 genomic DNA and RNA ratios indicates that the polymorphic 
forms of the gene are expressed at different levels in the 



9 individual 
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^ 10. The method of claim 1, further comprising 

2 sequencing a nontranscribed region of the gene to identify a 

3 second polymorphic site in a promoter or enhancer region of 

4 the gene. 

1 11. A method of monitoring expression levels of 

2 different polymorphic forms of a collection of genes, 

3 comprising: 

4 hybridizing genomic DNA, or an amplification product 

5 " thereof, from an individual to an array of immobilized probes 

6 comprising a subarray of probes for each gene in the 

7 collection, wherein each subarray comprises a first group of 

8 one or more probes exactly complementary to a first 

9 polymorphic form of the gene and a second group of one or more 

10 probes exactly complementary to a second polymorphic form of 

11 the gene; 

^2 analyzing the relative hybridization of the first 

13 and second group of probes to the genomic DNA or amplification 

14 product thereof for each subarray to identify heterozygous 

15 genes in the individual; 

hybridization RNA or an amplification product 

17 thereof from the individual to the array of immobilized 

18 probes; 

19 comparing the hybridization intensities of the first 

20 and second groups of probes to the RNA or amplification 

21 product to identify a subset of the heterozygous genes for 

22 which different polymorphic forms are expressed at different 

23 levels . 

1 12. The method of claim 11, wherein the collection 

2 of genes comprises at least 100 genes. 
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13. The method of claim 11, wherein the collection 
of genes comprises at least 1000 genes. 

14. The method of claim 11, wherein the collection 
of genes comprises at least 100,000 genes. 

15. The method of claim 11, further comprising 
sequencing a nontranscribed region of a gene in the subset to 
identify a further polymorphism in a promoter, enhancer or 
intronic sequence of the gene. 
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Polymorphisms and New Genes in the Region of the Human Hemochromatosis 

Gene 

■ 5 BACKGROUM D QFTHP IMy PKipoM 

Hereditary hemochromatosis (HH) is an inherited disorder of iron metabolism wherein 
the body accumulates excess iron. In symptomatic individuals, this excess iron leads to deleterious 
effects by being deposited in a variety of organs leading to their failure, and resulting in cirrhosis, 
diabetes, sterility, and other serious illnesses. The gene which is defective in this disease was 

10 disclosed in copending U.S.S.N. 08/652,265. 

Fine structure mapping of the region to which the gene responsible forHH. HFE 
(denoted HH or HFE in some publications), was mapped makes possible the identification of candidate 
sequences comprising the HFE gene, along with structural elements for regulation and expression and 
neighboring genes. 

A variety of techniques is available for fine structure mapping, including direct cDNA 
selection, exon-trapping, and genomic sample sequencing. The direct selection approach (Lovett et al. 
Proc. Natl. Acad. Sci USA 88:9628-9623 (1 991)) involves the hybridization of cDNA fragments to 
genomic DNA. This technique is extremely sensitive and capable of isolating portions of rare 
transcripts. Exon-trapping (Church etal. Nature Genetips 6:98-105 (1994)) recovers spliced introns 
from in vivo expressed genomic DNA clones and produces candidate exons without requiring any prior 
knowledge of the target's gene expression. High-throughput genomic DNA sequencing with 
comparison of the sequence data to databases of expressed sequences has also been used, such as 
in the positional cloning of the Werner syndrome gene (Yu et al. Science 277:258-262 (1 996)) and in 
cloning by homology of the second Alzheimer's disease gene on chromosome 1 (Levy-Lahad al. 
25 Science 269:973-977(1995)). 

HH is typically inherited as a recessive trait; in the current state of knowledge, 
homozygotes carrying two defective copies of the gene are most frequently affected by the disease. In 
addition, heterozygotes for the HFE gene are more susceptible to sporadic porphyria cutanea tarda 
and potentially other disorders (Roberts et al.. Lancet 349:321-323 (1997). It is estimated that 
approximately 10-15% of Caucasians carry one copyof the HFE gene mutation and that there are 
about one million homozygotes in the United States. HH. thus, represents one of the most common 
genetic disease mutations in Caucasian individuals. Although ultimately HH produces debilitating 
symptoms, the majority of homozygotes and heterozygotes have not been diagnosed. 

The need for such diagnostics is documented, for example, in Barton, J.c. et al 
Ngtur^Mediging 2:394-395 (1 996); Finch. C.A. West J Med 1 53:323-325 (1 990); McCusick. V. 
Mendplignlnh9rit!^nceinMan pp. 1882-1887, 1 1th ed.. (Johns Hopkins University Press. Baltimore 

(^^^'♦W; Reportpfg Joint World Health Omanrzation/Hpmnrhr o niatosfe Fn..^ ^.|j^n/c u 

HgPloghrpmgtPSis Assogigt*«?n Meeting pp the Preventinn and r,»ntrni r,, u^ ^oehrom;,tn.ic (1993). 
Edwards, C.Q. et al. New Engl >l Med 328:1616-1620 (1993); Bacon. B.R. New EnolJ MpH 326:125- 
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127 (1992); Balan, V. et al, g^^trP^Ptgrology 107:453-459 (1994); Phatak. P.D. et al. Arch IntMed 
154:769-776 (1994). 

A single mutation in the HFE gene, designated 24d1 in copending U.S.S.N. 
08/630.912. gave rise to the majority of disease-causing chromosomes present in the population today. 
5 This is referred to herein as the "common" or "ancestrai" or "common ancestral" mutation. These 

terms are used interchangeably, it appears that about 80% to 90% of all HH patients carry at least one 
copy of the common ancestral mutation which is closely linked to specific alleles of certain genetic 
markers close to this ancestral HFE gene defect. These markers are, as a first approximation, in the 
allelic form in which they were present at the time the ancestral HFE mutation occurred. See, for 

10 example, Simon. M. et al. Am J Hwm ggpgt 41 :89-105 (1987); Jazwinska, E.G. et al. Am J Hum n^nf ♦ 
53:242-257 (1993); Jazwinska. E.G. et al. Am J Hum Genet 56:428-433 (1995); Wonvood. M. et al. m 
J Hgmatpi 86:863-866 (1994); Summers, K.M, et al. Am J Hum Gepet (1989). 

Several polymorphic markers in the HFE region have been described and shown to 
have alleles that are associated with HH disease. These markers include the published microsateilite 

15 markers D6S258. D6S306 (Gyapay, G, et al. Nature Genetics 7:246-339 (1994)). D6S265 (Worwood. 
M. et al. Brit jH^matPl 86:833-846 (1994)), D6S105 (Jazwinska. E.G. et al. Am J Hum Genet 
53:242-257 (1993); Jazwinska, E.G. et al. Am J Hum Genet 56:428-433 (1995)), D6S1001 (Stone, C. 
et al. HMm Molgp Qgnet 3:2043-2046 (1994)), D6S1260 (Raha-Chowdhury et al. Hum Molec Genp^ 
4:1869-1874 (1995)) as well as additional microsateilite and single-nucleotde-polymorphism markers 

20 disclosed in co-pending PGT application WO 96/06583, the disclosure of which is hereby incorporated 
by reference in its entirety. Additionally, copending U.S.S.N. 08/630,912 disclosed additional markers 
24d2 and 24d7. 

The symptoms of HH are often similar to those of other conditions, and the severe 

effects of the disease often do not appear immediately. Accordingly, it would be desirable to provide a 
25 method to identify persons who may be destined to become symptomatic in order to inten/ene In time 

to prevent excessive tissue damage associated with iron overioad. One reason for the lack of early 

diagnosis Is the inadequacy of presently available diagnostic methods to ascertain which indivkluals are 

at risk, especially while such Individuals are presymptomatic. 

Atthough blood iron parameters can be used as a screening tool, a confirmed 
30 diagnosis often employs liver biopsy which is undesirably invasive, costly, and carries a risk of mortality. 

Thus, there is a clear need for the development of an inexpensive and noninvasive diagnostic test for 

detection of homozygotes and heterozygotes in order to facilitate diagnosis in symptomatic individuals. 

provide presymptomatic detection to gukie intervention in order to prevent organ damage, and for 

Wentification of heterozygote carriers. 

Furthermore, a need exists for both methods for fine structure mapping and a fine 

structure map of the region of the chromosome to which the HH locus maps. This and other needs 

are addressed by the present invention. 
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SUMMARY OF THF IMVFf^-nrn^i 

One aspect of the invention is an oligonucleotide comprising at least 8 to about 1 00 
consecutive bases from the sequence of Figure 9. or the complement of the sequence, wherein the at 
least 8 to about 1 00 consecutive bases includes at least one polymorphic site of Table 1 

Another aspect of the invention is an oligonucleotide pair selected from the sequence 
of Figure 9 or its complement for amplification of a polymorphic site of Table 1 

Another aspect of the invention is an isolated nucleic acid molecule comprising about 
1 00 consecutive bases to about 235 kb substantially identical to the sequence of Figure 9. wherein the 
DfvfA molecule comprises at least one polymorphic site of Table 1 . 

Another aspect of the invention is a method to determine the presence or absence of 
the common hereditary hemochromatosis (HFE) gene mutation in an individual comprising: 
providing DNA or RNA from the individual; and 

assessing the DNA or RNA for the presence or absence of a haplotype of 

Table 1. 

wherein, as a result, the absence of a haplotype of Table 1 indicates the likely 
absence of the HFE gene mutation in the genome of the indivkiual and the presence of the haplotype 
indicates the likely presence of the HFE gene mutation in the genome of the individual. 

Another aspect of the invention is a method to determine the presence or absence of 
the common hereditary hemochromatosis (HFE) gene mutation in an individual comprising: 
providing DNA or RNA from the individual; and 
assessing the DNA or RNA for the presence or absence of a genotype 
defined by a polymorphic allele of Table 1 , 

wherein, as a result, the absence of a genotype defined by a polymorphic 
allele of Table 1 indicates the likely absence of the HFE gene mutation in the genome of the individual 
and the presence of the genotype indicates the likely presence of the HFE gene mutation in the 
genome of the indivkjual. 

Another aspect of the invention is a culture of lymphoblastoid ceils having the 
designation ATCC CRL-1 2371 . 

One aspect of the invention is an isolated nucleic acid sequence comprising a nucleic 
acid sequence substantially identical to BTF1. 

A furttier aspect of the invention is an isolated nucleic acid sequence comprising a 
nucleic acid sequence substantially klentical to BTF2. 

A further aspect of the invention is an isolated nucleic acid sequence comprising a 
nucleic acid sequence substantially identical to BTF3. 

A further aspect of the invention is an isolated nucleic acid sequence comprising a 
nucleic add sequence substantially Identical to BTF4. 

A further aspect of the invention is an isolated nucleic acki sequence comprising a 
nucleic ackj sequence substantially Mentical to BTF5. 

A further aspect of the invention is an isolated nucleic acid sequence comprising a 
nucleic acid sequence substantially klentical to NPT3. 
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A further aspect of the invention is an isolated nucleic acid sequence comprising a 
nucleic acid sequence substantially identical to NPT4. 

A further aspect of the invention is an isolated nucleic acid sequence comprising a 
nucleic acid sequence substantially identical to RoRet. 

Additional aspects of the invention include nucleic acid sequences that are cDNAs 
polypeptides encoded by the nucleic acids of the invention and antibodies speciJically immunoreactive 
thereto, vectors comprising the nucleic acid sequences of the invention, and ho?.t cells stably 
transfected with the nucleic acids of the invention. 

A further aspect of the invention is an isolated nucleic acid sequence comprising at 
least 1 8 contiguous nucleotides substantially identical to at least 1 8 contiguous nucleotides of BTF1 

A further aspect of the invention is an isolated nucleic acid sequence comprising at 
least 18 contiguous nucleotides substantially Identical to at least 18 contiguous nucleotides of BTF2. 

A further aspect of the invention is an isolated nucleic acid sequence comprising at 
least 1 8 contiguous nucleotides substantially identical to at least 1 8 contiguous nucleotides of BTF3. 

A further aspect of the invention is an isolated nucleic acid sequence comprising at 
least 1 8 contiguous nucleotides substantially identical to at least 18 contiguous nucleotides of BTF4. 

A further aspect of the Invention is an isolated nucleic acid sequence comprising at 
least 18 contiguous nucleotides substantially identical to at least 1 8 contiguous nucleotides of BTF5. 

A further aspect of the invention is an isolated nucleic acid sequence comprising at 
least 18 contiguous nucleotides substantially identical to at least 18 contiguous nucleotides of NPT3. 

A further aspect of the invention is an isolated nucleic acid sequence comprising at 
least 18 contiguous nucleotides substantially identical to at least 18 contiguous nucleotides of NPT4. 

A further aspect of the invention is an isolated nucleic acid sequence comprising at 
least 1 8 contiguous nucleotides substantially identical to at least 1 8 contiguous nucleotides of RoRet. 

BRIEF DESCRtPTtON OF THg pRAWIMGS 

Figure 1 depicts a combination genetic, physical and transcription map of the HFE 
gene region. The first line shovw the relative positions of selected genetic markers that define the HFE 
region. The heavy bar below represents the YAC clone used in the direct selection experiment. The 
order and positions of the bacterial clones employed in the exon-trapping and sample sequencing is 
indicated under the YAC. The thin bar under the bacterial clones represents the approximate locations 
of a subset of the expressed sequence fragments mapped to the contig. The thicker bars show the 
location of the cDNAs cloned. Two regions are bracketed; the butyrophilin family of genes (BTF). and 
the region where complete genomic sequencing was carried out. 

Figure 2 is a schematic of the 250 kb of genomic sequence including the HFE gene. 
Both the structure of the overall cDNA (top) and that corresponding to the coding regions (bottom), as 
well as the direction of transcription are shown. The positions of the histone genes, the zinc a-2 
glycoprotein pseudogene, and the ESTs are also shown. 

Figure 3 depicts an alignment of the predicted amino acid sequence of the BTF 
proteins. Sequences were aligned in a pair-wise fashion using CLUSTAL W (Thompson et al. NucL 
Aci^g Res . 22:4673^680) to deduce the most parsimonious arrangement. The asterisks under the 
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alignment represent amino acids conserved in all 6 proteins; the •'dots" represent conserved amino 
acids substitutions. Boxed are the regions within the proteins which correspond to three conserved 
motifs: 1) the B-G domain, 2) the transmembrane domain c™), and 3) the B30-2 exon domain. 

Figure 4, panel (A) depicts a Northern blot analysis of representative members of the 
two groups of BTF proteins, BTF1 and BTF5. BTF1 hybridized to all tissues on the blot as a major 
transcript at 2.9 kb and a minor one at 5.0 kb. BTF5 hybridized to several transcripts ranging between 
4.0 and 3.1 kb and as a similar expression profile to BTF1 . Autoradiography was for 24 hours. The 
(J-actin hybridization demonstrated the variation in ploy (A)f RNA between the lanes. Autoradiography 
was for 1 hour. In panel (B), RT-PCR analysis demonstrated that the expression of both genes was 
widespread. Included in the (+) lane are cDNA 21 and 44 as positive controls; the (-) lane represents 
the no-DNA control. Amplification using primers for the RFP gene (Isomura etat. Nucleic Aniri R*.^ 
20:5305-5310 (1 992)) controlled for the integrity of the cDNA. All first strand cDNAs were checked for 
contaminating genomic DNA amplification by carrying out an identical experiment excluding the 
reverse transcriptase. In all cases, no amplification was obtained (data not shown). 

Figure 5(A) depicts an alignment of the predicted amino acW sequence of the RoRet 
gene to the 52 kD Ro/SSA auto-antigen protein. The asterisks under the alignment represent 
conserved amino acids; the "dots" represent consented amino acids substitutions. The putative DNA 
binding cysteine-rich domain and the B30-2 exon domain are boxed. Figure 5(B) depicts an alignment 
of the predicted amino acid sequence of the two novel putative sodium phosphate transport proteins to 
that of the NPT1. 

Figure 6. panel (A) depicts a Northem blot analysis of the RoRet gene. The RoRet 
cDNA hybridized to 4 different transcripts, ranging from 7.1 kb to 2.2 kb. Autoradiography was 
performed for 4 days. The re-hybridization~of the blot with a P-actin probe showed the variation in poly 
(A)+ RNA between the lanes. Autoradiography was for 1 hour. Panel (B) depicts RT-PCR analysis of 
the RoRet gene. Included in the (+) lane was a cDNA 27 positive control. Weak amplification of the 
correct size was observed in the small intestine, kkiney and liver. The other tissues were negative as 
was the no DNA control lane (-). The RFP primers demonstrated the integrity of the cDNA. Panel (C) 
depicts Northem blot analysis of NPT3 and NPT4. NPT3 was expressed at high abundance in the 
heart and muscle as a single 72 kb transcript. Lesser amounts were found In the other tissues. The 
expression pattern of NPT4 was more restricted, being found only in the liver and kidney as a smear of 
transcripts ranging from 2.6 to 1 .7 kb. Panel (D) depots RT-PCR analysis of the NPT3 and NPT4 
genes. Included in the (+) lane were the respective cDNA22E and 22B positive controls. The NPT3 
gene was expressed as the proper size PGR fragment in kkiney. liver, spleen and testis. A smaller 
fragment was detected in all tissues with the exception of the Mver. The no DNA control lane (-) was 
negative. NPT4 was expressed as the proper size fragment in the small intestine, kidney, liver and 
testis. Larger and smaller size fragments were found in ail other tissues with the exception of the brain. 
For both genes these different size fragments may indicate alternative splice events. The no DNA 
control lane (-) was negative. The RFP primers demonstrated the integrity of the cDNA. 

Figure 7 depicts the sequences of cDNA 21 (BTF1), cDNA 29 (BTF3). cDNA 23 
(BTF4), CDNA 44 (BTF5), cDNA 32 (BTF2). cDNA 27 (RoRet). cDNA 22B (NPT3). CDNA22E (NPT4). 
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Figure 8 depicts the nucleotide sequence of approximately 235 kb in the HFE 
subreglon from an unaffected individual. 

Figure 9 depicts the nucleotide sequence of approximately 235 kb in the HFE 
subregion from an HH affected individual. Polymorphic sites in the HH affected individual determined 
5 by comparing a sequence of the corresponding region from an HH unaffected Indivkjuai are listed and 
described in Table 1. 

DETAtLED DESCRIPTION 

A. Definitions 

Abbreviations for the twenty naturally occurring amino acids follow conventional 

1 0 usage. In the polypeptide notation used herein, the left-hand direction is the amino terminal direction 
and the right-hand direction is the carboxyi-terminat direction, in accordance with standard usage and 
convention. Similarly, unless specified otherwise, the left hand end of single-stranded polynucleotide 
sequences Is the 5' end; the left hand direction of double-stranded polynucleotide sequences is 
referred to as the 5* direction. The directon of 5' to 3* addition of nascent RNA transcripts ts referred to 

1 5 as the transcription direction; sequence regions on the DNA strand having the same sequence as the 
RNA and which are 5' to the 5' end of the RNA transcript are referred to as "upstream sequences": 
sequence regions on the DNA strand having the same sequence as the RNA and which are 3' to the 3' 
end of the RNA transcript are referred to as "downstream sequences'*. 

The term "nucleic acids", as used herein, refers to either DNA or RNA. "Nucleic acid 

20 sequence" or "polynucleotide sequence" refers to a single- or double-stranded polymer of 

deoxyribonucleotide or ribonucleotide bases read from the 5* to the 3* end. It includes both setf- 
replicating pfasmids, infectious polymers of DNA or RNA and nonfunctional DNA or RNA. The 
complement of any nucleic acid sequence of the invention is understood to be included in the definition 
of that sequence. 

25 "Nucleic add probes" may be DNA or RNA fragments. DNA fragments can be 

prepared, for example, by digesting plasmkl DNA, or by use of PGR, or synthesized by either the 
phosphoramidite method described by Beaucage and Carruthers, Tetrahedron L^f t, 22:1859-1862 
(1981), or by the triester method according to Matteucci, e/a/.. J. Am. Chem Snr lor^ r^iftFi (1981), 
both incorporated herein by reference. A double stranded fragment may then be obtained, if desired, 

30 by annealing the chemically synthesized single strands together under appropriate conditions or by 
synthesizing the complementary strand using DNA polymerase with an appropriate primer sequence. 
Where a specific sequence for a nucleic acid probe is given, it is understood that the complementary 
strand ts also identified and included. The complementary strand will work equally well in situations 
where the target is a double-stranded nucleic acid. 

35 The phrase "selectively hybridizing to" refers to a nucleic add probe that hybridizes, 

duplexes or binds only to a particular target DNA or RNA sequence when the target sequences are 
present In a preparation of total cellular DNA or RNA. "Complementary** or "target" nucleic add 
sequences refer to those nucleic acid sequences which selectively hybridize to a nudeic acid probe. 
Proper annealing conditions depend, for example, upon a probe's length, t>ase composition, and the 

40 number of mtematches and their position on the probe, and must often be determined empirically. For 
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discussions of nucleic acid probe design and annealing conditions, see, for example. Sambrook et aL, 
MQlecul^r Qloninq; ^ Ubftrmgrv Mgnu^l {2nd ed.). Vols. 1-3. Cold Spring Harbor Laboratory. (1989) or 
Current Protocols in Molecular Bioioyv F. Ausubel et a/., ed. Greene Publishing and Wiiey- 
Interscience, New York (1987). 

The phrase "nucleic acid sequence encoding" refers to a nucleic acid which directs 
the expression of a specific protein or peptide. The nucleic acid sp-- elude both the DNA 

strand sequence that Is transcribed into RNA and tbr* nslated into protein 

The nucleic acid sequences include both \ /) as well as non-full 

length sequences derived from the full lent 



1 that the sequence 
be introduced to 



1 0 includes the degenerate codons of the nathi 
provide codon preference in a specific host o 

The phrase "isolated" or "suti preparations that 

lack at least one protein or nucleic acid norma\ ..ucieic acid in a host ceil. 

The phrase "expression casset^^__... o to nucleotide sequences which are capable 

1 5 of affecting expression of a structural gene in hosts compatible with such sequences. Such cassettes 
include at least promoters and optionally, transcription termination signals. Additional factors 
necessary or helpful in effecting expression may also be used as described herein. 

The term "operably linked*' as used herein refers to linkage of a promoter upstream 
from a DNA sequence such that the promoter mediates transcription of the DNA sequence. 

20 The term "vector", refers to viral expression systems, autonomous self-replicating 

circular DNA (plasmkis). and includes both expression and nonexpression plasmids. Where a 
recombinant microorganism or cell culture is described as hosting an "expression vector," this includes 
both extrachromosomai circular DNA and DNA that has been incorporated into the host 
chromasome(s). Where a vector is being maintained by a host cell, the vector may either be stably 

25 replicated by the cells during mitosis as an autonomous structure, or is incorporated within the host's 
genome. 

The term "gene" as used herein is intended to refer to a nucleic ackl sequence which 
encodes a polypeptide. This definition includes various sequence polymorphisms, mutations, and/or 
sequence variants wherein such alterations do not affect the function of the gene product. The term 

30 "gene" is intended to include not only coding sequences but also regulatory regions such as promoters, 
enhancers, and termination regions. The temi further includes all introns and other DNA sequences 
spliced from the mRNA transcript along with variants resulting from alternative splice sites. 

The term "plasmid" refers to an autonomous circular DNA molecule capable of 
replication in a cell, and includes both the expression and nonexpression types. Where a recombinant 

35 microorganism or cell culture is described as hosting an "expression plasmid". this includes both 
extrachromosomai circular DNA molecules and DNA that has been incorporated into the host 
chromosome(s). Where a plasmkl is being maintained by a host cell, the plasmkJ is either being stably 
replicated by the cells during mitosfe as an autonomous structure or is incorporated within the host's 
genome. 



wo 98/14466 



PCT/US97/17658 



The phrase "recombinant protein" or "recombinantly produced protein" refers to a 
peptide or protein produced using non-nativa cells that do not have an endogenous copy of DNA able 
to express the protein. The cells produce the protein because they have been genetically altered by 
the introduction of the appropriate nucleic acid sequence. The recombinant pro.ein will not be found in 
association with proteins and other subcellular components normally associated with the cells 
producing the protein. The terms "protein" and "polypeptide" are used interchangeably herein. 

The following terms are used to describe the sequence relationships between two or 
more nucleic acids or polynucleotides; "reference sequence", "comparison window", "sequence 
identity", "percentage of sequence identity", and "substantial identity". A "reference sequence" is a 
defined sequence used as a basis for a sequence comparison; a reference sequence may be a subset 
of a larger sequence, for example, as a segment of a full-length cDNA or gene sequence given In a 
sequence listing, or may comprise a complete cDNA or gene sequence. 

Optimal alignment of sequences for aligning a comparison window may. for example, 
be conducted by the local homology algorithm of Smith and Waterman Adv Aoni lyi j^th, 2:482 (1 981 ). 
by the homology alignment algorithm of Needleman and Wunsch J. Mol. Biol 48;443 (1970), by the 
search for similarity method of Pearson and Lipman Proc. Natl. A^d. Sci. li s A 85:2444 (1988). or by 
computerized implementations of these algorithms (for example, GAP, BESTFIT, FASTA, and 
TFASTA in the Wisconsin Genetics Software Package Release 7.0. Genetics Computer Group, 575 
Science Dr., Madison, Wl). 

The terms "substantial identity" or "substantial sequence identity" as applied to nucleic 
add sequences and as used herein and denote a characteristic of a polynucleotide sequence, wherein 
the polynucleotide comprises a sequence that has at least 85 percent sequence identity, preferably at 
least 90 to 95 percent sequence identity. anU more preferably at least 99 percent sequence identity as 
compared to a reference sequence over a comparison window of at least 20 nucleotide positions, 
frequently over a window of at least 25-50 nucleotides, wherein the percentage of sequence identity is 
calculated by comparing the reference sequence to the polynucleotide sequence which may include 
deletions or additions which total 20 percent or less of the reference sequence over the window of 
comparison. The reference sequence may be a subset of a larger sequence. 

As applied to polypeptides, the terms "substantial identity" or "substantial sequence 
Identity" mean that two peptide sequences, when optimally aligned, such as by the programs GAP or 
BESTFIT using default gap weights, share at least 80 percent sequence identity, preferably at least 90 
percent sequence identity, more preferably at least 95 percent sequence identity or more. 
•Percentage amino acid identity" or "percentage amino add sequence identity" refers to a comparison 
of the amino acids of two polypeptides which, wrtien optimally aligned, have approximately the 
designated percentage of the same amino adds. For example. "95% amino add Identity" refers to a 
comparison of the amino acids of two polypeptides which when optimally aligned have 95% amino add 
Identity. Preferably, residue positions which are not identical differ by conservative amino acid 
substitutions. For example, the substitution of amino acids having similar chemical properties such as 
charge or polarity are not likely to effect the propertes of a protein. Examples include glutamine for 
asparagine or glutamic add for aspartc add. 
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The phrase "substantially purified" or "Isolated" when referring to a peptide or protein 
means a chemical composition which is essentially free of other cellular components. It is preferably in 
a homogeneous state although it can be in either a dry or aqueous solution. Purity and homogeneity 
are typically determined using analytical chemistry techniques such as polyacrylamide gel 
electrophoresis or high performance liquid chromatography. A protein which Is the predominant 
species present in a preparation is substantially purified. Generally, a substantially purified or isolated 
protein will comprise more than 80«^ of all macromolecular species present in the preparation 
Preferably, the protein is purified to represent greater than 90% of all macromolecular species present 
More preferably the protein is purified to greater than 95%. and most preferably the protein is purified to 
essenfaal homogeneity, wherein other macromolecular species are not detected by conventional 
techniques. 

The phrase "specifically binds to an antibody" or "specifically immunoreadive with" 
when referring to a protein or peptide, refers to a binding reaction which is determinative of the 
presence of the protein in the presence of a heterogeneous population of proteins and other biologies 
Thus, under designated immunoassay conditions, the specified antibodies bind to a particular protein 
and do not bind in a significant amount to other proteins present in the sample. Specific binding to an 
antibody under such conditions may require an antibody that is selected for its specificity for a particular 
protein. A variety of immunoassay formats may be used to select antibodies specifically 
immunoreactive with a particular protein. For example, solid-phase EUSA immunoassays are routinely 
used to select monoclonal antibodies specifically immunoreactive with a protein. See Harlow and Lane 
. AntibO<iigS , aLflh9ratPrYMaiHWl . Cold spring Harbor Publications. New York, for a description 

of immunoassay formats and conditions that can be used to determine specific immunoreactivity. 

As used herein. "EST or "Expressed Sequence Tag " refers to a partial DNA or cDNA 
sequence of about 1 50 to 500. more preferably about 300. sequential nucleotides of a longer 
sequence obtained from a genomic or cDNA library prepared from a selected cell, cell type, tissue or 
tissue type, or organisms which longer sequence corresponds to an mRNA or a gene found in that 
library. An EST is generally DNA. One or more libraries made from a single tissue type typically 
provide at least 3000 different (i.e. unique) ESTs and potentially the full complement of all possible 
ESTs representing all possible cDNAs. e.g.. 50.000 . 100.000 in an animal such as a human. (See, 
for example, Adams etal. Sclenca 252:1651-1656 (1991)). 

•Stringent" as used herein refers to hybridization and wash conditions of 50% formamide at 
42'C. Other stringent hybridization conditions may also be selected. Generally, stringent conditions 
are selected to be about 5' C lower than the themial melting point (Tm) for the specific sequence at a 
defined ionic strength and pH. The Tm is the temperature (under defined ionic strength and pH) at 
which 50% of the target sequence hybridizes to a perfectly matched probe. Typically, stringent 
conditions will be those in which the salt concentration is at least about 0.02 molar at pH 7 and the 
temperature is at least about BO'C. As other factors may significantly affect the stringency of 
hybridization, including, among others, base composition and size of the complementary strands, the 
presence of organic solvents and the extent of base mismatching, the combination of parameters is 
more important than the at)solute measure of any one. 
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B. Transcript Mao and New Genes near HH 

The instant invention provides a fine structure map of the 1 megabase region 
surrounding the HFE gene. As part of that map the instant invention provides approximately 250 kb of 
DNA sequence of v/hich about 235 kb are provided in Figure 8 and eight loci of particular interest 
5 corresponding to candidate genes within the 1 megabase region. These loci are useful as genetic and 
physical markers for further mapping studies. Additionaily, the eight cDNA sequences corresponding 
to those lod are useful, for example, for the isoiation of other genes in putative gene families, the 
identification of homologs from other species, and as probes for diagnostic assays. In particular, 
isolated nucleic acid sequences of at least 18 nucleotides substantially identical to contiguous 

1 0 nucIeotkJes of a cDNA of the invention are useful as PGR primers. Typically, the PGR primer will be 
used as part of a pair of primers In a PGR reaction. Isolated nucleic acid sequences preferably 
comprising about 18-100 nucleotides, more preferably at least 18 nucleotides, substantially identical to 
contiguous nucleotides in a cDNA of the invention are useful in the design of PGR primers and probes 
for hybridization assays. Additionally, the proteins encoded by those cDNAs are useful in the 

1 5 generation of antibodies for analysis of gene expression and in diagnostic assays, and in the 
purification of related proteins. 

Thus, in one embodiment of the invention, a 235 kb sequence is provided for the HFE 
subregion within the 1 megabase region mapped. This sequence can serve as a reference in genetic 
or physical analysis of deletions, substitutions, and Insertions in that region. Additionally, the sequence 

20 Information provides a resource for the further identification of new genes in that region. Thus, nucleic 
acid sequences substantially kjentically to the 235 kb sequence are also Included in the scope of this 
invention. 

In a further embodiment of the invention, a family of five genes, BTF1-5, is provided 
which are related by sequence homology to the milk protein butyrophilin (BT) (Figures 1 , 3. and 7). 

25 The predicted amino acid sequences of the proteins encoded by these genes are provkled in Figure 3. 
These cDNAs are useful for the identification of further members of the BT family and to study 
regulation of expression of this family of genes. The proteins encoded by these cDNAs can be useful 
In the identification and isolation of Itgands for the BT protein, and in the generation of agonists or 
antagonists of BT function. Nucleic add sequences substantially kientically to BTF1-5 and the proteins 

30 encoded by them are also included in the scope of this invention, including allelic forms. 

In a further embodiment of the invention, a novel gene RoRet is provided, which is 
related by sequence homology to the 52 kD Ro/SSA Lupus and Sjogren's syndrome autoantigen. This 
sequence is espedally useful In the Wentilication of other genes that may be invoh^ed In Lupus or 
Sjorgen's syndrome. The protein encoded by this cDNA can be useful in the identification and isolation 

35 of ligands for the autoantigen, and in the generation of agonists or antagonists of the antigen. Nucleic 
acki sequences substantially kientically to RoRet and the proteins encoded by them are also included 
in the scope of this Invention. 

In a further embodiment of the invention, two genes, NPT3 and NPT4, with structural 
homology to a type 1 sodium transport gene are provkled. These cDNAs and the proteins expressed 

40 by them are useful In determining the etiology of hypophosphatemia, along with being useful as probes 
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in the identification and isolation of further nnembers of the gene family. Nucleic acid sequences 
substantially identically to the NPT1-like sequences and the proteins encoded by them are also 
included In the scope of this invention. 

c. Pfflvmorphip Marker? 

The invention provides 397 new polymorphic sites in the region of the HFE gene. 
These polymorphisms are listed in Table 1 , As described below, these polymorphisms were identified 
by comparison of the DNA sequence of an affected individual homozygous for the common ancestral 
HH mutation with that of an unaffected individual disclosed in copending U.S. 08/724,394. 



Table 1. Polymorphic Sites in the HH Region 



Base Location 


Oiffe ranee 


Base Location 




35-36 


AC DEL 


19755 


G-A 


841 


T-C • 


19949 


w- 1 


2662-2663 


I I DEL 


20085 




3767 


T-C 


20366-20367 


n UNO 


3829 


C-G 


20463 


\yry 


4925-4928 


TAAADEL 


20841 


A-T 


5691 


C-T 


21059 


A-T 


5839 


T-C 


21117 


A-G 


6011 


G-A 


21837 


A-C 


6047 


C-G 


22293 


A-P 


6231 


G-A 


22786 


V-r-M 


6643 


A DEL 




o-M 


6698 


t-C 




T.A 


7186 


T-C 


26175 


G-C 


7273 


G-A 


26667 


ruA 


7545-7558 


TCACACACCGATTGG 
DEL 


26994 


T-C 

1 — w 


7672 


GDEL 


27838 


G-T 


7933 


T-C 


27861 


TDEL 


8746 


T-G 


28132 


G-A 


9115 


G-A 


29100 


G-A 


9823 


G-A 


29454-29457 


1 1 1 1 DbL 


10027 


G-A 


29787 


T-G 


10214 


C-T 


29825 


A-C 


10828 


A-G 


30009 


T-C 


10918 


C-G 


30177 


A-G 


10955 


A-G 


30400 


A-G 


11524 


C-A 


31059 


T-A 


11674 


A-G 


31280 


C-T 


11955 


T-C 


31749 


C-T 


12173-12175 


II 1 UbL 


32040 


C-G 


13304 


G-A 


32556-32559 


TGTG DEL 


13455 


G-A 


33017 


T-G 


14416-14417 


A INS 


33026 


TDEL 


14998 


C-T 


34434 


C-T 


15554 


T-C 


35179 


A-C 


15887 


A-G 


35695 


G-A 


15904-15919 


CCAAACTGATCI 1 IGA 
DEL 


35702 


G-A 


16019 


TDEL 


35983 


A-G 


16211 


A-T 


37411 


A-G 


17461 


A-G 


38526 


C-T 
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Base Location 


Difference 


Base Location 


Difference 




40431 


C-A 


72688 


C-G 




42054-42055 


TTDEL 


75323-75324 


TINS 




43783-43784 


1 1 11 INS 


75887 


G-C 




45120 


CDEL 


77519 


T-C 


5 


45567 


A-C 


77749 


G-A 




46601 


A-T 


77908 


T-C 




47255 


C-G 


78385 


C-G 




47758 


C-A 


78592-78593 


AG INS 




47994 


G-C 


80189 


T-G 


10 


48440 


G-A 


80279 


TDEL 




48650 


T-G 


80989-80990 


A INS 




48660 


A-G 


81193 


T-C 




50240 


C-T 


81273 


A DEL 




50553 


G-A 


82166 


G-A 


15 


50566 


G-T 


83847 


TDEL 




51322 


G-C 


84161-84162 


CA-GG 




51747 


A-G 


84533 


A-G 




52474 


C-G 


84638 


T-G 




52733 


C-A 


85526 


T-G 


20 


52875 


G-A 


85705 


G-T 




53631-53637 


1 1 1 1 1 1 1 L)bL 


86984 


T-C 




53707 


G-A 


87655 


T-C 




54819 


A-G 


87713 


A-C 




55913 


T-C 


87892 


C-T 


25 


56225 


A-C 


86192 


TDEL 




56510 


T-C 


88528 


A-G 




56566 


G-A 


89645 


A-T 




56818 


A-T 


89728 


A-G 




57815 


A-G 


90088 


T-C 


30 


58011 


TDEL 


91193-91194 


2209bp INS 




58247-58248 


TINS 


91373 


T-C 




58926 


C-G 


91433-91434 


A INS 




59406 


C-G 


91747 


G-A 




59422 


G-C 


93625 


TDEL 


35 


60221-60222 


A INS 


95116-95117 


TINS 




60656-60657 


CA DEL 


96315 


G-A 




61162 


G-A 


97981 


A-G 




61465 


G-A 


98351 


TDEL 




61607 


A DEL 


99249 


C-T 


40 


61653 


T-C 


100094-100095 


TINS 




61794-61795 


TINS 


100647-100648 


TTC INS 




62061 


G-C 


100951 


C-T 




62362 


T-G 


101610 


C-G 




62732 


C-G 


102589 


C-T 


45 


63364 


G-A 


103076-103077 


TATATATATATATA INS 




63430-63431 


GT INS 


103747 


T-C 




63754 


C-T 


105638 


A-C 




63765 


A-C 


107024 


C-T 




63870-63871 


A INS 


107322 


C-T 


50 


64788 


A-G 


107858 


C-G 




64962 


G-A 


109019 


A DEL 




65891 


C-T 


109579 


TDEL 




66675 


G-C 


110021 


C-A 




67186-67187 


ATT INS 


111251 


C-A 


55 


67746-67747 


TTINS 


111425 


G-A 




68259 


T-C 


112644 


T-A 




68836 


T-C 


113001 


G-C 




68976 


C-G 


113130 


C-T 




72508 


T-G 


114026 


G-A 
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Base Location 


Difference 


Base Location 


Difference 


114250 


A DEL 


176222 


T-C 


115217 


C-G 


176524 


A-T 


117995 


G-A 


176684 


G-A 


118874 


A-G 


176815 


T-C 


119470 


T-C 


177049 


T-C 


119646 


G-T 


177065 


G-T 


120853 


C-T 


178285 


T-C 


121582 


G-A 


178551-178552 


CM 1 II 1 1 11 f TT1INS 


123576 


A-C 


179114-179115 


A INS 


125581 


C-T 


179260 


C-G 


125970 


G-T 


179281 


C-G 


126197 


A-G 


180023 


G-C 


126672 


A DEL 


180430 


T-C 


126672 


G-C 


180773 


T-C 


128220-128221 


A INS 


180824 


T-C 


132569 


C-T 


181097 


C-T 


133572 


A-C 


181183 


A-T 


134064 


T-G 


182351 


C-T 


136999 


G-A 


183197 


G-A 


137784 


C-T 


183623 


A-T 


138903 


G-A 


183653 


G-T 


139159-139160 


A INS 


183657 


T-G 


140359 


G-A 


183795-183796 


A INS 


140898 


C-T 


184060 


G-A 


141313 


CDEL 


184993 


G-A 


141343 


T-C 


185918 


A-G 


142148 


T-C 


186036 


T-C 


142178 


C-A 


186506-186507 


TAAC INS 


142433-142434 


ATAGAINS 


186561-186568 


TATTTATT DEL 


143783 


C-T 


186690 


GDEL 


144090 


C-T 


186751 


T-A 


144220-144221 


A INS 


187221 


A-G 


144725 


A-C 


187260 


A-G 


145732-145733 


AAAAAAAAAAAAAA INS 


187444-187447 


CTCTDEL 


147016-147017 


CG DEL 


187831-187832 


CINS 


147021 


G-T 


188638 


G-A 


147536 


T-G 


188642 


C-T 


148936 


T-A 


189246 


T-C 


149061 


T-C 


190340 


A-C 


154341 


A-T 


190354 


A-G 


154588 


G-A 


190762 


A-G 


155464 


G-A 


191260 


G-T 


158574 


C-G 


193018-193019 


AGAT INS 


160007 


C-T 


193147 


T-G 


164348 


A-T 


193196-193197 


CINS 


164499 


C-G 


193499 


C-T 


166677-166678 


AAAG INS 


193738 


C-G 


167389 


G-A 


193984-193985 


ACACACAC INS 


168506-168507 


AGGATGGTCT INS 


194064 


C-G 


168515 


T-C 


194504 


A DEL 


169413-169414 


AAINS 


194734 


G-A 


170300-170301 


TTGTTGTTGTTG INS 


194890 


A-C 


170491 


G-A 


195404 


G-A 


173428 


T-C 


195693 


A-T 


173642 


G-A 


196205 


G-A 


173948 


T-G 


197424 


C-T 


175330 


T-C 


197513 


C-T 


175836 


T-C 


197670 


G-A 


176200 


G-C 


198055 


C-A 
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Base Location 


Difference 


Base Location 


Difference 




198401 


C-T 


215947 


C-A 




198692 


A-G 


216232 


A-G 




198780 


T DEL 


14 Til TD 

21 7470 


G-A 




199030 




04 onco 


T-C 


5 


199933 


C-T 


2190o2-2l9Uo3 


ATATATATATATTATATAT AT 

ATATAT AT A 1 A 1 A 1 A 1 ATAT 
INS 




200027 


G-A 


219314 


C-A 




200439 


T-A 


219327 


G-A 




200452 


A-G 


219560 


C-T 




200472-200483 


AATAATAATAAT DEL 


219660 


C-T 


10 


200559 


A-T 


219889 


G-A 




200745 


A-G 


220198 


G-T 




200919 


T-A 


220384 


G-A 




201816 


C-T 


220451-220452 


CAAAAA INS 




201861-201862 


42bD INS 


221363 


0> A 

G-A 


15 


202662 


T-C 


221645 


G-A 




202880 


T-C 


^^^^ ^ A 
222119 


T-C 




204341 


C-T 


222358 


A-G 




204768 


A-T 


222367 


A-C 




205284 


T-G 


222686 


A-G 


20 


207400 


C-A 


222959 


T-C 




208634 


T-C 


223270-223271 


TT DEL 




208718 


T DEL 


223283 


T-C 




208862 


A-C 


224964 


T-C 




209419-209420 


TTDEL 


225232 


A-C 


25 


209802 


G-A 


225366-225367 


MM INS 




209944 


C-G 


225416 


G-C 




210299 


A-G 


225486 


T-C 




211142 


G-A 


226088 


A-G 




212072 


G-A 


228421 


A-G 


30 


212146 


T-C 


230047 


G-A 




212379 


G-A 


230109 


G-C 




212637-212639 


TCT DEL 


230376 


C-G 




212696 


T-C 


230394 


A-G 




213042 


T-A 


231226 


A 

A-G 


35 


214192 


A-G 


231447 


G-A 




214529-214530 


1 M M 1 1 1 1 1 MNS 


231835 


A-G 




214549 


T-C 


232400-232402 


AAA DEL 




214795 


C-T 


232402-232403 


GINS 




214908 


T-G 


232515 


T-C 


40 


214977 


A-G 


232703 


G-T 




215769 


C-T 


232750 


A-G 



* D6S2238occunttbasc 1. 24dl oooun ttbtse 41316. D6S2239ocoun 116^84841. D6S224 1 oocwi at bate 235032 



45 Table 2. Polymorphic Allele Frequencies 



Location 


Frequency of ancestral variant in 


Frequency of unaffected variant 




random chromosomes 


in random chromosomes 


232703 


53% 


47% 


231835 


53% 


47% 


230394 


85% 


15% 


230376 


25% 


75% 


230109 


53% 


47% 


225486 


45% 


55% 


225416 


75% 


25% 


220198 


43% 


57% 


219660 


58% 


42% 
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Location 


Frequency of ancestral variant in 


Frequency of unaffected variant 




random chromosomes 


in random chromosomes 


219560 


53% 


47% 


214977 


65% 


35% 


214908 


50% 


50% 


214795 


24% 


76% 


214549 


53% 


47% 


214192 


65% 


35% 


210299 


53% 


47% 


208862 


80% 


20% 


208634 


48% 


52% 


207400 


25% 


75% 


205284 


50% 


50% 


204341 


53% 


47% 


202880 


58% 


42% 


202662 


98% 


2% 


200027 


25% 


75% 


199030 


58% 


42% 


198692 


65% 


45% 


198401 


55% 


45% 


198055 


55% 


45% 


195693 


60% 


40% 


195404 


25% 


75% 


194890 


55% 


45% 


175330 


53% 


47% 


173948 


83% 


17% 


173642 


55% 


45% 


173428 


80% 


20% 


168515 


80% 


20% 


160007 


18% 


82% 


149061 


58% 


42% 


148936 


82% 


18% 


147536 


100% 


0% 


147021 


46% 


54% 


141343 


55% 


45% 


140359 


55% 


45% 


138903 


55% 


45% 


132569 


81% 


19% 


125581 


18% 


82% 


121582 


80% 


20% 


120853 


18% 


82% 


118874 


65% 


15% 


115217 


50% 


50% 


113130 


40% 


60% 


113001 


48% 


52% 


107858 


48% 


52% 


103747 


50% 


50% 


96315 


25% 


75% 


91194 


80% 


20% 


90088 


75% 


25% 


89728 


50% 


50% 


89645 


50% 


50% 


88528 


63% 


37% 


87892 


75% 


25% 


87713 


60% 


40% 


87655 


50% 


50% 


86984 


79% 


21% 


85705 


50% 


50% 


85526 


50% 


50% 
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Location 


Frequency of ancestral variant in 


Frequency of unaffected variant 




random chromosomes 


in random chromosomes 


64638 


50% 


50% 


84533 


50% 


50% 


82166 


78% 


22% 


81193 


58% 


42% 


80189 


50% 


50% 


78385 


80% 


20% 


77908 


88% 


12% 


68976 


50% 


50% 


68259 


51% 


49% 


66675 


80% 


20% 


62732 


50% 


50% 


62362 


40% 


60% 


61653 


48% 


52% 


61465 


5% 


95% 


61162 


60% 


40% 


53707 


100% 


0% 


52875 


50% 


50% 


52733 


74% 


26% 


52474 


47% 


53% 


50586 


50% 


50% 


50553 


50% 


50% 


50240 


50% 


50% 


48680 


53% 


47% 


48650 


63% 


37% 


48440 


50% 


50% 


47255 


50% 


50% 


46601 


53% 


47% 


45567 


49% 


51% 


41316 


5% 


95% 


40431 


20% 


80% 


38526 


23% 


77% 


37411 


70% 


30% 


35983 


5% 


95% 



35 These polymorphisms provide surrogate markers for use in diagnostic assays to 

detect the likely presence of the mutations 24d1 and/or 24d2, in preferably 24d1 , in homozygotes or 
heterozygotes. Thus, for example, DNA or RNA from an Individual is assessed for the presence or 
absence of a genotype defined by a polymorphic allele of Table 1 , wherein, as a result, the absence of 
a genotype defined by a polymorphic allele of Table 1 indicates the likely absence of the HFE gene 

40 mutation in the genome of the individual and the presence of the genotype indicates the likely presence 
of the HFE gene mutation In the genome of the indhndual. 

These markers may be used singly, in combination with each other, or with other 
polymorphic markers (such as those dfeclosed in co-pending PCT application WO 96/06583) in 
diagnostic assays for the likely presence of the HFE gene mutation in an indhndual. For example, any 

45 of the markers defined by the polymorphic sites of Table 1 can be used In diagnostic assays In 

combination with 24d1 or 24d2, or at least one of polymorphisms HHP-1, HHP-19, or HHP-29. or 
microsatellite repeat alleles 19D9:205; 18B4:235; 1A2:239; 1E4:271; 24E2:245; 288:206; 3321-1:98; 
4073-1:182; 4440-1:1 80; 4440-2:1 39; 731-1:177; 5091-1:148; 3216-1:221; 4072-2:170; 950-1:142; 
950-2:164; 950-3:165; 950-4:128; 950-6:151; 950-8:137; 63-1:151; 63-2:113; 63-3:169; 65-1:206; 65- 
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2:159; 68-1:167; 241-5:108; 241-29:113; 373-8:151; and 373-29:1 13. D6S258:199. D6S265:122. 
D6S105:124; D6S306:238; D6S484:206; and D6S1001:180. 

Table 2 lists the frequency of about 1 00 of the alleles defined by the polymorphic sites 
of the invention in the general population. As is evident from the Table, certain of these alleles are 
present rarely in the general population. These polymorphisms are thus preferred as surrogate 
markers in diagnostic assays for the presence of a mutant HFE allele C'gene mutation") such as 24d1 
or 24d2. Preferably, the frequency of the polymorphic allele used in the diagnostic assay in the 
general population is less than about 50%. more preferably less than about 25%. and most preferably 
less than about 5%. Thus, of the genotypes defined by the alleles listed in Table 2. polymorphisms 
occurring at base 35983 and base 61465 of Figure 1 are preferred. 

It will be understood by those of skill in the art that because they were Mentified in an 
ancestral HH homozygote. the haplotypes defined by the polymorphic sites of Table 1 are predictive of 
the likely presence of the HFE gene mutation 24d1 . Thus, for example, the likelihood of any affected 
individual having at least two or more of aOY of the polymorphic alleles defined by Table 1 is greater 
than that for any unaffected Individual. Similarly, the likelihood of any affected individual having at least 
three or more of sm of the polymorphic alleles defined by Table 1 is greater than that for any 
unaffected individual. 

Thus, for example, in a diagnostic assay for the likely presence of the HFE gene 
mutation in the genome of the individual. DNA or RNA from the indivklual is assessed for the presence 
or absence of a hapiotype of Table 1. wherein, as a result, the absence of a haplotype of Table 1 
indicates the likely absence of the HFE gene mutation in the genome of the individual and the 
presence of the haplotype indicates the likely presence of the HFE gene mutation in the genome of the 
individual. 

The markers defined by the polymorphic sites of Table 1 are additionally useful as 
markers for genetic analysis of the inheritance of certain HFE alleles and other genes which occur 
within the chromosomal region corresponding to the sequence of Figure 9 which include, for example, 
those disclosed in copending U.S.S.N. 08/724,394. 

As the entire nucleotide sequence of the region is provided in Figure 9. it will be 
evident to those of ordinary skill in the art which sequences to use as primers or probes for detecting 
each polymorphism of interest. Thus, in some embodiments of the invention, the nucleotide 
sequences of the imrention include at least one oligonucleotide pair selected from the sequence of 
Figure 9 or its complement for amplification of a polymorphic site of Table 1 . Furthemiore. in some 
embodiments of the invention a preferred hybridization probe is an oligonucleotide comprising at least 
8 to about 100 consecutive bases from the sequence of Figure 9. or the complement of the sequence 
wherein the at least 8 to about 100 consecutive bases includes at least one polymorphic site of Table 
1 . In some embodiments the polymorphic site is at base 35983 or base 61465. 

It will also be appreciated that the nucleic acid sequences of the invention include 
isolated nucleic add molecules comprising about 100 consecutive bases to about 235 kb substantially 
identical to the sequence of Figure 9. wherein the DNA molecule comprises at least one polymorphic 
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site of Table 1 . Such isolated DNA sequences are useful as primers, probes, or as the component of a 
kit in diagnostic assays for detecting the likely presence of the HFE gene mutation in an individual. 
0. Nucleic Acid Based Screening 

Individuals carrying polymorphic alleles of the invention may be detected at either the 
5 DNA, the RNA, or the protein level using a variety of techniques that are well known In the art. The 
genomic DNA used for the diagnosis may be obtained from body cells, such as those present in 
peripheral blood, urine, saliva, bucca, surgical spedmen, and autopsy specimens. The DNA may be 
used directly or may be amplified enzymatically in vitro through use of PGR (Saiki et al. Science 
239:487-491 (1988)) or other in vitro amplification methods such as the ligase chain reaction (LCR) 
10 (Wu and Wallace ggnpmtg? 4:560-569 (1989)), strand displacement amplification (SDA) (Walker et al. 
ProQ. N?tl, Aggd. ggj. U.? A 89:392-396 (1992)), self-sustained sequence replication (3SR) (Fahy et 
al. PgR Methods AppI. 1:25-33 (1992)). prior to mutation analysis. The methodology for preparing 
nucleic ackJs In a form that is suitable for mutation detection is well known in the art. 

The detection of polymorphisms in specific DNA sequences, such as in the region of 
1 5 the HFE gene, can be accomplished by a variety of methods including, but not limited to, restriction- 
fragment-length-polymorphtsm detection based on allele-speclfic restriction-endonuclease cleavage 
(Kan and Dozy LaD££l ii:910-912 (1978)), hybridization with allele-specific oligonucleotide probes 
(Wallace et al. Ntigl Agids Rg§ 6:3543-3557 (1978)), including immobilized oligonucleotides (Saiki et 
a'- Prpp. Natl. Acad, Sci. U.S.A. 86:6230^234 (1989)) or oligonucleotide an-ays (f^askos and Southern 
20 NticI AgidS Rg? 21 :2269-2270 (1993)), allele-specific PGR (Newton et al. NucI Acids Res 1 7:2503- 
2516 (1989)). mismatch-repair detection (MRD) (Faham and Cox Genome Res 5r474^fi9 (1995)), 
binding of MutS protein (Wagner et al. NucI Adds Res 23:3944-3948 (1 995), denaturing-gradient gel 
electrophoresis (DGGE) (Fisher and Lerman et al. Proc. Nafi Acad Sci. U S A 80:1579-1583 (1983)), 
single-strand-conformation-polymorphism detection (Orita et al. Genomics 5:874-879 (1983)), RNAase 
25 cleavage at mismatched base-pairs (Myers et al. Science 230:1242 (1985)). chemical (Cotton et al. 
Pfgg, Natl. Agad. g,S,A.-e5:4397-4401 (1988)) or enzymatic (Youil et al. Proc. Natl. Acad Sd 
iLSA 92:87-91 (1995)) cleavage of heteroduplex DNA, methods based on allele specific primer 
extension (Syv^nen et al, ggngmiw 8:684^92 (1990)), genetic bit analysis (GBA) (Nikiforov et al. NucI 
AgidS Rgg 22:4167-4175 (1994)), the oligonucleotide-ligation assay (OLA) (Landegren et al. Science 
30 241:1077 (1988)), the allele-specific ligation chain reaction (LCR) (Barranv Proc. Natl. Acad 

LLSA 88:189-193 (1991)). ga|>-LCR (Abravaya et al. MUfiLAcidaBfiS 23:675-682 (1995)). radioactive 
and/or fluorescent DNA sequencing using standard procedures well known In the art. and peptide 
nucleic add (PNA) assays (Orum et al., NucI. Acids Res. 21 :5332-5356 (1 993); Thiede et al.. NucI. 
Acids Res. 24:983-984 (1996)). 

*n addition to the genotypes defined by the polymorphisms of the invention, as 
described in co-pending PCT application WO 96/35802 published November 14. 1996. genotypes 
characterized by the presence of the alleles 1909:205; 18B4:235; 1A2:239; 1E4:271;24E2:245; 
2B8:206; 3321-1:98 (denoted 332M:1 97 therein); 4073-1:182; 4440-1:180:4440-2:139; 731-1:177; 
5091-1:148; 3216-1:221; 4072-2:170 (denoted 4072-2:148 therein); 950-1:142; 950-2:164; 950-3:165; 
40 950-4:128; 950-6:151; 950-8:137; 63-1:151; 63-2:113; 63-3:169; 65-1:206; 65-2:159; 68-1:167; 241- 



wo 98/14466 



PCTAJS97/17658 



19 

5:108; 241-29:113; 373-8:151; and 373-29:113. alleles D6S258:199, D6S265:122, D6S105:124. 
D6S306:238, D6S464:206; and D6S1 001:1 80, and/or alleles associates with the HHP-1, the HHP-19 
or HHP-29 single base-pair polymorphisms can also be used to assist in the identification of an 
individual whose genome contains 24d1 and/or 24d2. For example, the assessing step can be 
performed by a process which comprises subjecting the DNA or RNA to amplification using 
oligonucleotide primers flanking a polymorphism of Table 1, and oligonucleotides flanking 24d1 and/or 
24d2, oligonucleotide primers flanking at least one of the base-pair polymorphisms HHP-1 . HHP-19. 
and HHP-29, oligonucleotide primers flanking at least one of the mlcrosatellite repeat alleles, or 
oligonucleotide primers for any combination of polymorphisms or mlcrosatellite repeat alleles thereof. 

Oligonucleotides useful in diagnostic assays are typically at least 8 consecutive 
nucleotides in length, and may range upwards of 18 nucleotides in length to greater than 100 or more 
consecutive nucleotides. Such oligonucleotides can be derived from either the genomic DfMA of Figure 
8 or 9, or cDNA sequences derived therefrom, or may be synthesized. 

Additionally, the proteins encoded by such cDNAs are useful in the generation of 
antibodies for analysis of gene expression and in diagnostic assays, and in the purification of related 
proteins. 

E. General Methods 

The nucleic add compositions of this invention, whether RNA, cDNA, genomic DNA, 
or a hybrid of the various combinations, may be isolated from natural sources, including cloned DNA, 
or may be synthesized in vitro. The nucleic acids claimed may be present in transformed or 
transfected whole ceils, in a transformed or transfected cell lysate, or in a partially purified or 
substantially pure form. 

Techniques for nucleic acid manipulation of the 
nucleic acid sequences of the invention such as subcloning nucleic acid sequences encoding 
polypeptides into expression vectors, labeling probes. DNA hybridization, and the like are described 
generally in Sambrook etaL, Molecular Clnnino - a Labor atory Manual (2nd Ed.). Vol. 1-3. Cold Spring 
Harbor Laboratory. Cold Spring Harbor, New York, (1989), which is incorporated herein by reference. 
This manual is hereinafter referred to as "Sambrook ef a/.- 

There are various methods of isolating the nucleic acid sequences of the invention. 
For example, DNA is isolated from a genomic or cDNA library using labeled oligonucleotide probes 
having sequences complementary to the sequences disclosed herein. Such probes can be used 
directiy in hybridization assays. Alternatively probes can be designed for use in amplification 
techniques such as PCR. 

To prepare a cDNA library. mRNA is isolated from tissue such as heart or pancreas, 
preferably a tissue wherein expression of the gene or gene family is likely to occur. cDNA is prepared 
from the mRNA and ligated into a recombinant vector. The vector fe transfected into a recombinant 
host for propagation, screening and cloning. Methods for making and screening cDNA libraries are 
well known. See Gubler. U. and Hoffman, B.J. Ggnfi 25:263-269 (1 983) and Sambrook ai 

For a genomic library, for example, the DNA is extracted from tissue and either 
mechanically sheared or enzymatically digested to yield fragments of about 12-20 kb. The fragments 
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are then separated by gradient centrifugation from undesired sizes and are constructed in 
bacteriophage lambda vectors. These vectors and phage are packaged in vitro, as described in 
Sambrook. etal. Recombinant phage are analyzed by plaque hybridization as described in Benton 
and Davis. ScifiDSS 196:180-182 (1977). Colony hybridization is carried out as generally described in 
5 M. Grunstein etal. Proc. Natl. Ac ad. ScL USA. 72:3961-3965 (1975). 

DNA of interest is kientified in either cDNA or genomic libraries by its ability to hybridize 
with nucleic acid probes, for example on Southern blots, and these DNA regiomi are isolated by 
standard methods familiar to those of skill in the art. See Sambrook, et ai 

In PGR techniques, oligonucleotide primers complementary to the two 3' borders of 
the DNA region to be amplified are synthesized. The polymerase chain reaction is then earned out 
using the two primers. See PCR Protocols: a Gukie to Methods and Apnlir^t^^n^ (Innis. M. Gelfand, 
D., Sninsky, J. and White, T.. eds.). Academic Press, San Diego (1990). Primers can be selected to 
amplify the entire regions encoding a full-length sequence of interest or to amplify smaller DNA ' 
segments as desired. 

PCR can be used in a variety of protocols to isolate cDNA's encoding a sequence of 
interest. In these protocols, appropriate primers and probes for amplifying DNA encoding a sequence 
of interest are generated from analysis of the DNA sequences listed herein. Once such regions are 
PCR-amplified, they can be sequenced and oligonucleotide probes can be prepared from sequence 
obtained. 

Oltgonucleotides for use as primers or probes are chemically synthesized according to 
the solid phase phosphoramidite triester method first described by Beaucage, S.L and Carruthers, 
M.H., T^trghgrfron Lgtt,, 22(20):1859-1862 (1981) using an automated synthesizer, as described in 
Needham-VanDevanter, D.R.. ef a/., Nucleic Acids R^^, 12:6159-6168 (1984). Purification of 
oligonucleotides is by either native acrylamide gel electrophoresis or by anion-exchange HPLC as 
described in Pearson. J.D. and Regnier, F.E.. J. Chrom. . 255:137-149 (1983). The sequence of the 
synthetic oligonucleotide can be verified using the chemical degradation method of Maxam. A.M. and 
Gilbert, W., In Grossman, L and Moidave, D., eds. Academic Press, New York, Methods in 
EnzvmoloQV 65:499-560 (1980). 
1. Expression 

Once DNA encoding a sequence of Interest is isolated and cloned, one can express 
the encoded proteins in a variety of recombinantly engineered cells. It is expected that those of skill in 
the art are knowledgeable in the numerous expression systems available for expression of DNA 
encoding a sequence of interest. No attempt to describe in detail the various methods known for the 
expression of proteins in prokaryotes or eukaryotes Is made here. 

In brief summary, the expression of natural or synthetic nucleic adds encoding a 
sequence of Interest will typically be achieved by operably linking the DNA or cDNA to a promoter 
(which is either constitutive or inducible), followed by incorporation into an expression vector. The 
vectors can bo suitable for replication and integration in either prokaryotes or eukaryotes. Typical 
expression vectors contain transcription and translation terminators, initiation sequences, and 
promoters useful for regulation of the expression of polynucleotide sequence of Interest. To obtain 
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high level expression of a cloned gene, it is desirable to construct expression plasmids which contain, 
at the minimum, a strong promoter to direct transcription, a ribosome binding site fortransiational 
initiation, and a transcription/translation terminator. The expression vectors may also comprise generic 
expression cassettes containing at least one independent terminator sequence, sequences permitting 
5 replication of the plasmid in both eul^aryotes and prokaryotes. i.e., shuttle vectors, and selection 

markers for both prokaryotic and eukaryotic systems. See Sambrook et al. Examples of expression of 
ATP-sensitive potassium channel proteins in both prokaryotic and eukaryotic systems are described 
belov/. 

a. Expression in Prokarvotes 
^ 0 A variety of procaryotic expression systems may be used to express the proteins of the 

invention. Examples include E. co//, Bacillus, Streptomyces, and the like. 

rt is preferred to construct expression plasmids which contain, at the minimum, a 
strong promoter to direct transcription, a ribosome binding site for transiational initiation, and a 
transcription/translation terminator. Examples of regulatory regions suitable for this purpose in E. co// 
15 are the promoter and operator region of the E. coli tryptophan biosynthetic pathway as described by 
Yanofsky, C. JLBiClfidoL 158:1018-1024 (1984) and the leftward promoter of phage lambda (PA) as 
described by Herskowitz, I. and Hagen. D., Ann. Rev. Genet. 14:399^45 (1980). The inclusion of 
selection markers in DNA vectors transformed in £ coli is also useful. Examples of such markers 
include genes specifying resistance to ampicillln, tetracycline, or chloramphenicol. See Sambrook et 
20 al. for details concerning selection markers for use in E. coli. 

To enhance proper foWing of the expressed recombinant protein, during purification 
from £ co//. the expressed protein may first be denatured and then renatured. This can be 
accomplished by solubilizing the bacterially produced proteins in a chaotropic agent such as guanidine 
HCI and reducing all the cysteine residues with a reducing agent such as beta-mercaptoethanol. The 
25 protein is then renatured. either by slow dialysis or by gel filtration. See U.S. Patent No. 4,51 1 ,503, 
Detection of the expressed antigen is achieved by methods known in the art as 
radioimmunoassay, or Western blotting techniques or immunoprecipitation. Purification from £ coli 
can be achieved following procedures such as those described in U.S. Patent No. 4,51 1 .503. 
b. Expression in Eukarvotes 

A variety of eukaryotic expression systems such as yeast. Insect cell lines, bird, fish, 
and mammalian cells, are known to those of skill in the art. As explained briefly below, a sequence of 
Interest may be expressed in these eukaryotic systems. 

Synthesis of heterologous proteins in yeast is well known. Methods in Yeast Genetics . 
Sherman. P., etal., Cold Spring Harbor Laboratory, (1982) is a well recognized work describing the 
35 various methods available to produce the protein in yeast. 

Suitable vectors usually have expression control sequences, such as promoters, 
including 3-phosphoglycerate kinase or other glycolytic enzymes, and an origin of replication, 
termination sequences and the like as desired. For instance, suitable vectors are described in the 
literature (Botstein, etal., Gene 8:17-24 (1979); Broach, etaL. Gene 8:121.133 (1979)). 
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Two procedures are used in transfornrting yeast cells. In one case, yeast cells are first 
converted into protoplasts using zynnoiyase. lyticase or glusulase. followed by addition of DNA and 
polyethylene glycol (PEG). The PEG-treated protoplasts are then regenerated in a 3% agar medium 
under selective conditions. Details of this procedure are given in the papers by J.D. Beggs. Nature 
(London) 275:104-109 (1978); and Hinnen, a., e/a/., Proc. Natl. Acad. Sci 75:1929-1933 
(1978). The second procedure does not involve removal of the cell wall. Instead the cells are treated 
with lithium chloride or acetate and PEG and put on selective plates (Ito, H., et ai, j. Bact„ 153*163-168 
(1983)). 

The proteins of the Invention, once expressed, can be isolated from yeast by lysing the 
cells and applying standard protein isolation techniques to the lysates. The monitoring of the 
purification process can be accomplished by using Western blot techniques or radioimmunoassay or 
other standard Immunoassay techniques. 

The sequences encoding the proteins of the invention can also be ligated to various 
expression vectors for use in transforming cell cultures of. for instance, mammalian, insect, bird or fish 
15 origin. Illustrative of cell cultures useful for the production of the polypeptides are mammalian cells. 
Mammalian cell systems often will be in the form of monolayers of cells although mammalian cell 
suspensions may also be used. A number of suitable host cell lines capable of expressing intact 
proteins have been developed in the art, and include the HEK293» BHK21. and CHO cell lines, and 
various human cells such as COS cell lines, HeLa cells, myeloma eel! lines. Jurkat cells, etc. 
20 Expression vectors for these cells can include expression control sequences, such as an origin of 

replication, a promoter (e.g., the CMV promoter, a HSV tk promoter or pgk (phosphoglycerate kinase) 
promoter), an enhancer (Queen etal Immunol. Rev 89:49 (1985)). and necessary processing 
information sites, such as ribosome binding sites, RNA splice sites, polyadenylation sites (e.g.. an SV40 
large T Ag poly A addition site), and transcriptional terminator sequences. Other animal cells useful for 
25 production of ATP-sensitive potassium channel proteins are available, for instance, from the American 
Type Culture Collection Catalogue of Cell Unes and Hybridomas (7th edition. (1992)). 

Appropriate vectors for expressing the proteins of the invention in insect cells are 
usually derived from the SF9 baculovirus. Suitable insect cell lines include mosquito larvae, silkworm, 
armyworm, moth and Drvsophila cell lines such as a Schneider cell line (See Schnekler J. Embrvol. 
30 Exp. MorohoL 27:353-365 (1 987). 

As indicated above, the vector, e.g., a plasmid, which is used to transform the host 
cell, preferably contains DNA sequences to Initiate transcription and sequences to control the 
translation of the protein. These sequences are referred to as expression control sequences. 

As with yeast, when higher animal host cells are employed, polyadenylation or 
35 transcription terminator sequences from known mammalian genes need to be incorporated into the 

vector. An example of a terminator sequence is the polyadenylation sequence from the bovine grovirth 
hormone gene. Sequences for accurate splicing of the transcript may also be included. An example 
of aspllcing sequence is the VP1 InUonfrom SV40 (Sprague, J. ef a/.. JJflloL 45: 773-781 (1983)). 

Additionally, gene sequences to control replication in the host cell may be 
incorporated Into the vector such as those found in bovine papilloma virus type-vectors. 



40 
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Saveria-Campo, M., 1985, "Bovine Papilloma virus DNA a Eukaryotic Cloning Vector" in DNA Cloninn 
Vol. I| a Pr^Qtiggl APProggh Ed. D.M. Glover. IRL Press, Arlington. Virginia pp. 213-238, 

The host cells are competent or rendered competent for transformation by various 
means. There are several well-known methods of introducing DNA into animal cells. These include: 
calcium phosphate precipitation, fusion of the recipient cells with bacterial protoplasts containing the 
DNA. treatment of the recipient cells with liposomes containing the DNA. DEAE dextran, 
electroporation and micro-injection of the DNA directly into ttie cells. 

The transformed cells are cultured by means well known in the art fBiochemical 
MgthQC*? in Cgfl Culture gnd Virology. Kuchler, R.J., Dowden, Hutchinson and Ross. Inc., (1977)). The 
expressed polypeptides are Isolated from cells grown as suspensions or as monolayers. The latter are 
recovered by well known mechanical, chemical or enzymatic means. 

2. PMrtfigation 

The proteins produced by recombinant DNA technology may be purified by standard 
techniques well known to those of skill in the art. Recomblnantly produced proteins can be directly 
expressed or expressed as a fusion protein. The protein is then purified by a combination of cell lysis 
(e.g.. sonication) and affinity chromatography. For fusion products, subsequent digestion of the fusion 
protein with an appropriate proteolytic enzyme releases the desired polypeptide. 

The polypeptides of this invention may be purified to substantial purity by standard 
techniques well known In the art. including selective precipitation with such substances as ammonium 
sulfate, column chromatography. Immunopurification methods, and others. See. for instance, R. 
Scopes. PlQtgjn Purifteation: PnngiPlgg and Prggtice. Spnnger-Verlag: New York (1982), incorporated 
herein by reference. For example, in an embodiment, antibodies may be raised to the proteins of the 
invention as described herein. Cell membranes are isolated from a cell line expressing the 
recombinant protein, the protein is extracted from the membranes and immunoprecipitated. The 
proteins may then be further purified by standard protein chemistry techniques as described above. 

3. AntH?9<<|e!? 

As mentioned above, antibodies can also be used for the screening of polypeptide 
products encoded by the polymorphic nucleic acids of the invention. In addition, antibodies are useful 
in a variety of other contexts in accordance with the present Invention. Such antibodies can be utilized 
for the diagnosis of HH and. in certain applications, targeting of affected tissues. 

Thus, in accordance with another aspect of the present invention a kit is provided that 
is suitable for use in screening and assaying for the presence of polypeptide products encoded by the 
polymorphic nucleic ackjs of the Invention by an immunoassay through use of an antibody which 
specifically binds to polypeptide products encoded by the polymorphic nucleic acids of the invention in 
combination with a reagent for detecting the binding of the antibody to the gene product 

Once hybridoma cell lines are prepared, monoclonal antibodies can be made through 
conventional techniques of priming mice with prfetane and interperitoneally Injecting such mice with the 
hybrid cells to enable harvesting of the monoclonal antibodies from ascites fluid. 

In connection with synthetic and semi-synthetic antibodies, such terms are intended to 
cover antibody fragments, isotype switched antibodies, humanized antibodies (mouse-human, human- 
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mouse, and the like), hybrids, antibodies having plural specificities, fully synthetic antibody-like 
molecules, and the like. 

This invention also embraces diagnostic kits for detecting DNA or RNA comprising a 
polymorphism of Table 1 in tissue or blood samples which comprise nucleic acic. probes as described 
5 herein and instructional material. The kit may also contain additional components such as labeled 
compounds, as described herein, for identification of duplexed nucleic acids. 

The following examples are provided to illustrate the invention but not to limit its scope. 
Other variants of the invention will be readily apparent to one of ordinary skill In tie art and are 
encompassed by the appended claims. 

10 F> EXPERIIVIENTAL EXAMPLE? 

1. Meoabase transcript map 

In these studies direct selection, exon-trapping, and genomic sample sequencing were 
used to generate a transcript map of a 1 megabase region approximately 8.5 megabases telomeric to 
HLA-A in the vicinity of HFE. This region 6p21 .3 was flanked by the genetic markers D6S2242 and 
1 5 D6S2241 . The starting material for these experiments was a 1 megabase YAC labeled y899g1 and a 
bacterial clone contig of this region (Feder et el. Nature Genetics 1 3:399-408 (1 996)). These 
techniques and other methods used in the study are outlined below, 
a. Direct Selection (OS) 

Poly A* RNA from human fetal brain, liver and small intestine (Clontech, Palo Alto, 

20 CA) were converted into cDNA using random primers and a Superscript cDNA synthesis kit (Life 
Technologies, Garthersburg, MD). The cDNA was digested with Mbo I and ligated to cDNA Mbo I 
linker-adaptors. Unligated linker-adaptor were removed by passage through cDNA spun columns 
(Pharmacia. Piscataway, NJ). The 6 ng of each of the ligated cDNAs were amplified using the cDNA 
Mbo l-S primer (5'^CTGATGCTCGAGTGAATTC-3'). The amplified products were purified on S-400 

25 spin columns (Pharmacia, Piscataway, NJ), ethanol precipitated and resuspended at 1 mg/ml in TE. 
Gel-purified yac899g1 (Centre d'Etude du Polymorphisme Humain) was processed as described by 
Morgan etal. (Nv<gl. API^g Rg?. 20:5173.5179 (1992)). The cDNAs were mixed in equal molar 
amounts for a total of 3 mg, and blocked with a mixture of 4 mg Cot-1 DNA (Life Technologies, 
Gaithersburg, MD), and a cocktail of Sau 3A-digested ribosomal and five different histone DNAs. The 

30 blocked cDNAs were hybridized to biotinylated yac899g1 DNA and streptavidin capture was carried out 
as described by Morgan etaL CibkJ). After the second round of selection, the eluted cDNAs were 
amplified using the cDNA Mbo l-S primer which included a (CUA)4 repeat at the 5' end to facilitate 
cloning into a version of pSP72 (Promega, Madison, Wi) constructed for use with uradl-DNA 
glycolyase cloning (UDG, Life Technologies, Gaithersburg, MD). Recombinants were transformed In 

35 DHSa, 1 000 clones picked Into a 96 well format, and clones propped for DNA sequencing using AGTC 
boiling 96-we!l mini-prep system (Advance Genetic Technologies. Gaitherburg, MD). 

Four hundred and sixty five clones were sequenced and the resulting data searched 
t>y BLAST (Altschul etaL J. Mol. BioL 215:403^10 (1990)). Those clones representing repetitive, 
bacterial, yeast, mitochondrial and histone sequences were eliminated from future considerations. The 

40 remaining sequences were then searched for overiaps and assembled into 108 unique DS contigs. 
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The number of clones per DS conlig varied between 1 to 22 with the length of each contig ranging 
from 250bp to 850 bp. Small sequence-tag-sites PGR assays were developed for each DS contig and 
two experiments were carried out concomitantly; mapping each DS contig back to the bacterial clone 
contig of the region and testing for the presence of each DS contig in cDNA libraries. Overall, 86 or 
80% of the DS contigs mapped back to the region and were found to be in cDNA libraries. The number 
of 80% mapping to the region was probably an underestimate of the fkielity of the direct-selection since 
PGR assays which cross exon-intron boundaries would t>e expected to fail or give larger size products, 
thereby being scored negative, 
b. Expn-Trappfnq 

CsCI-purified genomic PI (Genome Systems). BAG (Research Genetics) and PAG 
(Genome Systems) DNAs were digested with BamHI. Bgl I!, Pst I Sac 1 and Xho I and 125 ng of each 
digest iigated into 500 ng pSPL3 (Church etaL Nature Genetics 6:98-105 (1994)) (Life Technologies, 
Garthersburg, MD) digested with the appropriate restriction enzyme and phosphatased with calf 
intestinal alkaline phosphatase (USB, Cleveland. OH). One tenth of the ligation was used to transform 
XLI-Blue MRP cells (Stratagene. La Jolla, GA) by electroporation. Nine tenths of the electroporation 
was used to inoculate 10 ml of LB + lOOpg/ml of carbenicillen and after ovemight growth. DNA was 
prepared using Qiagen Q-20 tips (QIagen GmbH, Hiiden Germany). The remaining one tenth was 
plated on LB + 1 00 pg/ml carbenicillen plates to evaluated the efficiency on cloning and to test 
individual clones for the present of single inserts. GOS-7 cells were seed overnight at a density of 1 .4 
x10*/well in 6 well dishes. One pg of DNA was transfected using 6ml of Lipofect-Ace. Cytoplasmic 
RNA was isolated 48 hr post-transfection. RT-PCR was carried out as described by Church a/, (ibid) 
using commercially available reagents Life Technologies. Garthersburg, MD). The resulting 
CUA-tailed PGR fragments for each restriction digested bacterial clone were pooled and UDG cloned 
into pSP72-U (a derivative of pSP72). The DNA was transformed in DH5a and the cells plated onto 
nylon membranes. After ovemight growth, duplicates were made and the DNA hybridized to "P 
end-labeled oligos designed to detect various background products associated with the pSPL3 vector. 
One set of filters vras hybridized with the following gel-purified oligos in 6X SSC aqueous hybridization 
solution at 42^ C: 

vector-vector splicing 5'-CGAGCCAGCAACCTGGAGAT-3' 
cryptic donor-1021 5-AGCTCGAGGGGCGGCTGCAG-3* 
cryptic donor-1 1 34 5'-.AGACCCCAAGCCACAAGAAG-3* 
The filters were washed twice in 6X SSC, 10 mM sodium pyrophosphate (NaPPQ at 60° C, 30 mins. 

After ovemight autoradiography, non-hybridizing clones were picked and grown in 250 
pi of LB + 1 0Opg/ml of carbenicillin in 96 well mini-rack tubes. The samples were analyzed by PGR 
using the secondary PGR primers supplied in the kit (Ufe Technologies. Garthersburg. MD) and those 
clones with inserts greater than 200 bp were selected for sequencing. 

Ninety-six exon traps per bacterial clone were sequenced for a total of 768 reactions 
and the resulting data analyzed by BLAST. In addition, each potential exon was searched against a 
database of the 86 DS contigs to eliminate redundant sequences. PGR assays were developed for 
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each of the potential exons and they were tested for their presence in cDNA libraries. A total of 48 
potentiat exons remained after these screening steps, 
c. Sample Seouencina 

A minimal set of bacterial clones chosen to cover y899g1 were prepped with the 
5 Qiagen Maxi-Prep system and purified on CsCl. Ten micrograms of DMA from each bacteria! clone 
was sonicated in a Heat Systems Sonicator XL and end-repaired with Klenow (USB) and T4 
polymerase (USB). The sheared fragments were size selected between three to four kilobases on a 
0.7% agarose gel and then ligated to BstXI linkers (Invitrogen), The ligations were gel purified on a 
0.7% agarose gel and cloned into a pSP72 derivative plasmid vector. The resulting plasmlds were 

10 transformed into electrocompetent DH5a cells and plated on LB-carbenicillin plates. A sufficient 
number of colonies was picked to achieve 1 5-fotd clone coverage. The appropriate number of 
colonies was calculated by the following equation to generate a single-fold sequence coverage: 
Number of colonies = size of bacterial clone (in kb)/average sequence read length (0.4 kb). These 
colonies were prepped in the 9&-well AGCT system and end-sequenced with oligo MAPI using 

15 standard ABI Dye Terminator protocols. MAPI was CGTTAGAACGCGGCTACAAT. The MAPI 

sequences were screened locally with the BLAST algorithm against all available public databases. All 
sequence identities were catalogued and cross referenced to the DS and exon-trapped databases. 

A total of 3794 end sequence reactions were run to achieve the theoretical 1X 
coverage. Eighty-five percent of these sequences contained non-bacterial non-vector inserts. An 

20 additional 1060 end sequence reactions were run from the opposite end of the cloning vector to 
augment the sequence coverage and to prepare for contigging across selected regions. BLAST 
searches to all publicly available databases kJentifled 12 histone genes and 74 unk)ue expressed 
sequence fragments (ESF). The ESP represent a collection of ESTs and other expressed sequence 
fragments that were selected due to their sequence kientity over a significant portion of genomic DNA. 

25 The ESF were cross referenced against the DS and exon-trapped databases to eliminate 

redundancies. 58 unk^ue ESF remained, representing 39 distinct clones. Included in these ESF are 5 
sequences homologous to histone genes. 



Table 3. EST's found by Sample Sequencing Large Insert Bacterial Clones 



Clone name 


Bacterial 


Homology 5* 


Homology 3' 


Poly A+ 


Genomic 


cDNA 




clone 


blastx 


blastx 


signal* 


poly Wat 


Homology 


EST03556 


pcl57c3 


na^ 


none' 


+ 




cDNA 28 


ym33fll 


pcl57c3 


ZNF 


na 


na 


na 




EST04698 


pcl57c3 


na 


NSH* 


+ 






EST04812 


pcl57c3 


na 


NSH 








yb89b08 


pcl57c3 


NSH 


na 


na 


na 




yd88gn 


pcl57c3 


na 


nsh 


+ 






yj49b01 


pcl57c3 


NSH 


na 


na 


na 




yv81d05 


pcl57c3 


HG17 Human 


NSH 


+ 




CDNA30 


yg57h09 


pl96e20 


BUTYBOVIN 


NSH 


+ 




CDNA21 


yq23d08 


pl96e20 


BUTYBOVIN 


NSH 


+ 




CDNA21 
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V^lufiv name 


DBcienai 
clone 


Homology 5' 
blastx 


Homology o 
blastx 


Poly A+ 
signal* 


Genomic 
poly (A)o, 


cDNA 
Homology 


yo65fD6 


pl96e20 


NSH 


na 


na 


na 


cDNA 29 


yv88c09 


pl96e20 


BUTYBOVIN 


na 


na 


na 


cDNA 29 


ydl7d06 


pl96e20 


NSH 


na 


na 


na 


cDNA 23 


ye25g03 


pl96e20 


BUTYBOVIN 


NSH 


na 


na 


cDNA 44 


ysO4h08 


pc4Sp21 


NSH 


NSH 


+ 




cDNA 44 


yn01c05 


pl96e20 


BUTYBOVIN 


na 


na 


na 


cDNA 32 


YG78FI0 


PC45P2I 


NSH 


NSH 


na 


na 




yh54fn 


pl96e20 


none 


NSH 








ys05b08 


pcl57c3 


NSH 


Alu 




+ 




ybl2hn 


bl32al2 


NSH 


Histone H3.1 








HSC2EE082 


bl32al2 


na 


NSH 








HUMieOhllb 


bl32al2 


none 


na 


na 


na 




yg04f09 


bl32bl2 


Line element 


Alu 




+ 




1 yd37dn 


bl32al2 


NSH 


Alu 


_ 


+ 


1 

1 


ym29g03 


bl32al2 


Hisione H2A 


NSH 






cDNA 37 


yi77b02 


bI32al2 


NSH 


NSH 






cDNA 37 


yh76b05 


bl32al2 


NSH 


Alu 








yu98e02 


bl32al2 


NSH 


Alue 




+ 




yd72hl2 


bi32al2 


Alu 


NSH 


+ 


+ 




ydl9d03 


pc222k22 


Histone H2B.1 


NSH 


+ 


_ 




ye98g01 


bl32al2 


NSH 


NSH 






cDNA 


yi61fD7 


bl32al2 


NSH 


NSH 




+ 




ESTO5340 


b3el7 


na 


Alu 




+ 




yd35d05 


pc222k22 


NSH 


NSH 




4- 




yc52a05 


pc75L14 


NSH 


na 


na 


na 




yd84a05 


pc75L14 


none 


none 








1 yr42a05 


pc75L14 


NaPi transport 


none 


+ 




CDNA22B 1 


yd83h08 


b20h20 


NSH 


none 


+ 






ye38c09 


b20h20 


NSH 


Alu 




+ 




yp74c05 


b20h20 


NaPi transport 


Alu 




na 





Bracketed area is the critical region 

1 Signal of ATAAA or ATTAA 4 No Significant Homologies 

2 Not available 5 3* splice that is not on contig 

3 "NONE" reported by blast 6 Poor EST sequence 

d. cDNA library screening 

Superscript plasmid cDNA libraries, brain, liver and testis, were purchased from Life 
Technologies, Gaithersburg, MD. Colonies were plated on Hybond N fillers (Amersham) using 
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standard techniques. Insert probes from DS, exons and EST (LM.A.G.E. clones; Genome Systems) 
were all isolated by PGR followed by purification in low-melting point agarose gels (Seakem). The 
DNAs were labeled in gel using the Prime-tt II kit (Stratagene, La Jolla, CA). Small exon probes were 
labeled using their respective STS PGR primers instead of random primers. Up to 5 different probes 
were pooled In a hybridization. Filters were hybridized in duplicate using standard techniques. Putative 
positives were screened by PGR using the probe's STSs to kientify clones. Inserts from positive clones 
were subcloned in pSP72 and sequenced. 

e. Northern blots and RT-PCR analysis 

Multiple tissue northem blots were purchased from Clontech and hybridized according 
the manufacturer's instructions. RT-PCR was carried out on random primed first strand cDNA made 
from poly A+ RNA (Clontech) using AmpIiTaq Gold (Perkln-Elmer). Control reactions were performed 
on RNA samples processed in the absence of reverse transcriptase to control for genomic DNA 
contamination. 

f. Genomic SeQuencing 

The MAPI sequences from the bacterial clones b132a2, 222K22, and 75L14 were 
assembled into contigs with the Staden package (available from Roger Staden, MRC). A minimal set 
of 3 kb clones was selected for sequencing with oligo labeled MAP2 that sits on the opposite end of the 
plasmid vector. The sequence of MAP2 was GCCGATTCATTAATGCAGGT. The MAP2 sequences 
were entered into the Staden database In conjunction with the MAPI sequences to generate a tiling 
path of 3 kb clones across the region. These sequences were also screened with the BLAST algorithm 
and alt novel sequence kJentities were noted. The plasmid 3 kb libraries were concun^ently 
transformed in 96 well format into pox38UR (available from C. Martin, Lawrence Berkeley 
Laboratories). The transformants were subsequently mated with JGM (Strathman etai. p.N.A.S. 
88:1247-1250 (1991) In 96 well format All matings of the 3 kb clones within the tiling path were 
streaked on LB-carbenicillin-kanamycin plates and a random selection of 12 colonies per 3 kb clone 
was prepped in the AGCT system. The oligos -21 : CTGTAAAAGGACGGGCAGTC, and REV: 
GCAGGAAACAGCTATGACC were used to sequence off both ends of the transposon. Each 3 kb 
clone was assembled In conjunction with the end sequence information from all bacterial clones to 
generate complete sequence across the region. The genomic sequence was analyzed with the 
BLAST nucleotide and protein homology algorithms and the GRAIL 1 .2 software to Identify novel open 
reading frames (ORF) for gene finding. 

g. Discussion 

A compilation of 174 ESF led to the construction of an expressed sequence map of 
the region that served as the framework for the isolation of full-length cDNAs (Figure 1). (The map 
shows the subset of ESF that were actually mapped). Probes were developed for 82 best ESFs which 
appeared to be derived from the coding portions of cDNAs and the appropriate cDNA libraries were 
screened. This led to the isolation of 19 cDNAs, 17 of which represented novel sequences. 70 of the 
174 ESF were included in the cDNAs isolated (40%). 36 probes failed to produce any clones even 
after repeated screening of several libraries. 51 ESF which were not accounted for in the cDNAs 
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cloned were not used in any screen. Therefore, ft is possible that some additional genes within this 1 
megabase region may have escaped detection. 

A list of these cDNAs cloned and a comparison of the methods used to find them is 
presented in Table 4. Direct selection found 1 4 out of the 1 8 cDNAs contained within the boundaries of 
the YAC used in the experiment. Bcon trapping found 15 out of the 19 cDNAs contained within the 
boundaries of the large insert bacterial clone contig. Sample sequencing identified 11 genes that had 
con-esponding ESTs in the public database. 

Table 4. Comparison of gene finding methods 



Bacterial Clone 


CDNA# 


Homology 


EST 


DS 


Exon Trap 

1 


I57c 


28 


zinc finger 


EST03556 


2 


157c3 


30 


nonhistone 


yv8ld05 


1 


none 








yvh07al0 






I57c3 


46 


ORF 


ydSSgU 


1 




157c3 


20 


BT 


none 


none 


3 


pi 8696 


21 


BTFI 


ynOIG5 


4 


5 








yg23d08 












yg57h09 












yul5h03 







45p21 


32 


BTF2 


yg78n0 








yn01c05 


45p21 


29 


BTF3 


ye25g03 








yo65<D6 


45p2I 


23 


BTF4 


ydl7d06 


45p21 


44 


BTF5 


ysO4h08 


3el7 


41 


genomic? 


none 


132a2 


43 


genomic? 


none 


132a2 


36 


genomic? 


none 


132a2 


37 


histone 2A 


ym29g03 








yh87a03 


75JI4 


24 


MHC class I 


ye98g01 


132a2 


39 


genomic? 


none 


132a2 


27 


Ro/SSA 


none 


132a2 


22B 


NPTl -like 


yr42a05 








yfO9g06 


20h20 


22£ 


NPTMike 


none 


20h20 


NPTl 


NPTl 


yp74c05 



4 
2 
none 

none 
I 
3 



none 
3 
1 

2 
N/A 



6 
4 
1 

3 

none 
none 

2 
4 
4 
7 

5 
3 
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As a finai approach, a tiling path with overlapping end sequences from the sample 
sequence database was generated. Each 3 kb clone within the path was shotgun-sequenced using 
transposable elements as platforms for dual end sequencing. These individual clones were assembled 
in conjunction with the end sequences from all bacterial clones in the region. The resulting sequence 
5 (Figure 2) was analyzed systematically with BLAST homology searches and the Grail 1 .2 program to 
identify novel open reading frames (ORF) and other gene-like structures. The BLAST homology 
searches did not produce any probes that had not already been identified by sample sequencing. Grail 
predicted exons for all the genes in the region, but was only able assemble the histones into any 
representative form. A detailed analysis of BLAST homology searches to protein databases identified 

10 an enticing homology to a zinc alpha 2 glycoprotein approximately 25 kb upstream of HFE, but the lack 
of a substantial ORF and the presence of a stop codon suggested that it was a pseudogene. Figure 2 
shows the positions, the exon and intron structures, and the relative orientation of transcription of novel 
genes vrtthin this region. Also shown are the positions and transcriptional orientations of the histone 
genes. A total of 12 histone genes were identified in this study. 

15 In an effort to account for the ESTs that did not associate with the characterized genes 

in the 250 kb region, the genomic sequence around the putative 3' ends were examined for 
polyadenyiation signals to determine whether certain EST sequences may have originated from 
genomic DNA contamination in the normalized cDNA libraries used in EST generation. The positions 
of the 14 ESTs found in this region are indicated in Figure 2 to show those associated with the cDNAs 

20 cloned and those which did not associate with genomic DNA of obvious coding potential. Four ESTs 
corresponded to 3 of the 4 cDNAs cloned from the region (Table 2). One EST encoded a histone 
H2B.1 gene and another was a repetitive element. Of the remaining 8, 6 EST clones were used as 
probes of cONA libraries with negative results. Those sequences representing putative 3' ends of 
cDNA were searched for the presence of poly (A)+ addition signals. Five of the 1 3 ESTs which had 3' 

25 end sequence, had the sequence ATAAA or ATTAA. Five of the remaining 8 ESTs that did not have a 
poly (A)+ addition signal had genomic encoded stretches of poly (A) near the end of EST sequence 
and, therefore, may have been created by oligo d{T) priming of contaminating genomic DNA. This 
analysis was expanded to include all ESTs in the large-insert bacterial contigs with definitive 3* ends. 
Of the remaining 26. 15 had 3* end sequence and, of these, 8 had poly {A)+ addition signals. Five of 

30 these 8 ESTs were associated with the cloned cDNAs. Of the remaining 7 which did not have poly (A)+ 
addition signals, 4 had genomic encoded stretches of poly (A), 
i. ButvroDhilin aene family 

The human homolog of the bovine butyrophilin gene (BT) was cloned and mapped to 
approximately 480 kb centromeric to HFE (Figure 1). BT is a t-ansmembrane protein of unknown 

35 function which constitutes 40% of the total protein associated with the fat globule of bovine milk (Jack 
et a/, j. Bigl. Qhgm. 265:14481-14486 (1990)). A human homolog of BT has recently been cloned by 
Tayloer e( ai (BiMhem BlgPhVgAgt^ 1306:1-4 (1996)). The results in this study indicated that BT is a 
member of a gene family with at least five other members of the family residing in this region (Figure 
1). A comparison of these proteins is shown in Figure 3. The proteins were aligned based on their 

40 descending order of relatedness and to minimized gaps in the sequence. Each of the five proteins 
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display varying degrees of homology to BT. BTF1 (cDNA 21), BTF2 (cDNA 32), BTF5 (cDNA 44). and 
BTF3 (cDNA 29) are 45%, 48%. 46%, and 49%, identical to BT. whereas BTF4 (cDNA 23). which is 
more similar to BTF3 {cDNA 29), is only 26% identical. This low degree of identity to BT is largely due 
to a truncation at the carboxyl terminus of the protein. The BTF family falls into two groups: BTF1 and 
2 which are more related to each other than to BT or the other BTF members, and BTF5, 3 and 4, 
which appear to have a common evolutionary origin. The order of these genes on the chromosome 
suggests that the BT gene has duplicated two times, giving rise to BTF1 and BTF5. Subsequently, it 
appears likely these two genes experienced further duplication events to give rise to the other 
members in their groups. 

The three major components of BT, the B-G immunoglobulin superfamily domain 
(containing the V consensus sequence) (Miller e^ a/. Proc. Natl. Acad. Sci. U,S./\ 88:4377-4381 
(1991 )). the transmembrane region, and the B30-2 exon are found in all of these proteins (with the 
exception of BTF4 (cDNA 23) which lacks the B30-2 exon by virtue of the carboxyl terminal truncation). 
The exon B30-2 is a previously noted feature of the MHC class 1 region found approximately 200 kb 
centromeric to the HLA-A gene (Vernet et el., J. Mol. Evol. 37:600^1 2 (1993)). In addition this exon is 
found in several genes of diverse function telomeric to HLA-A namely MOG (approximately 200 kb) 
and RFP (approximately 1 megabase) (Amadou etai Genomics 26:9-20 (1995)). 

The levels of the BTF mRNA were analyzed by northern blot analysis (Figure 4A). 
The expression of the BTF genes fell Into two patterns. BTF1 and BTF2 were expressed as a single 
major transcript of 2.9 kb and one minor transcript of 5.0 kb. These genes were expressed at high 
levels in all the tissues tested with the exception of the kidney where the expression level was less. The 
two genes are 90% identical at the DNA sequence level, therefore, it is possible that the signal 
observed on the northerns was the result of cross-hybridization and only one of the two genes was 
actually expressed. To address this possibility RT-PCR experiments were carried out on a panel of 
different tissues in order to detect possible tissue dependent expression that would suggest that both 
genes are expressed. Identical, and thus equivocal, results were obtained with both BTF1 and BTF2 
amplification (Figure 4B). 

The second group of genes. BTF3-5, are expressed as three (BTF5) (Figure 4A) and 
two (BTF3 and 4) transcripts ranging from 4.0 to 3.3 kb. BTF5 is expressed at moderate levels in all 
tissues tested with the exception of the kidney where the expression level is less. RT-PCR 
experiments showed that mRNA from the BTF5 gene can be found in all tissues tested, including the 
kkJney (Figure 4B). Identical results were obtained with primers from the other genes of this group 
(data not shov\m). These genes are also 90% identical to each other at the DNA sequence level (but 
only 58% identical to BTF1 and 2). hence like BTF1 and BTF2, cross-hybridization could account for 
the similarity In size and patterns on the northern blots and RT-PCR. This might be particulariy true for 
BTF4 which lacks tfie B30-2 exon but still hybridizes to larger size transcripts like BTF5 and BTF3. 
H. A qgne with gimilaritv to 52 kP Ro/ssa a uto^nti«An 

Located approximately 120 kb telomeric to the HFE gene Is a gene, RoRet. that has 
58% amino acid similarity to the 52 kD Ro/SSA protein, an auto-antigen of unknown function that is 
frequently recognized by antibodies in patients with systemic lupus and Sjogren's syndrome (Anderson 
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e/ a/. LajQCfil 2:456-560 (1961); Clark et at. J. Immunol. 102:117-122 (1969)) (Figures 1 and 2), 
Alignment of the predicted amino acid sequence of this cDNA with that of 52 kD Ro/SSA indicated two 
features associated with the 52 kD Ro/SSA protein: a putative DNA binding cysteine rich motif 
(C-X-(l,\0-C-X(11-30)-C-X-H-X-(FJX)-C.X(2)-C-(aM-X(10-18)-C-P-X-C) found at the N terminus 
5 (Freemont et al. £§11 64: 483-484 (1 991)) and the B30-2 exon found near the carboxyl terminus, are 
both conserved in RoRet (Figure 5). Northern blot analysis indicated the RoRet gene was expressed 
as two major transcripts of 2.8 and 2.2 kb and two minor transcripts of 7.1 and 4.4 kb in all of the 
tissues on the blot at levels reflective of the RNA amounts as determined by P-actin probing (Figure 
6A). Using RT-PCR, expression can also be detected in small intestine, kidney liver, and spleen 
10 (Figure 6B). 

Hi. Tvyo genes wfth homology to a sodium phosphate transporter 
A cDNA for a sodium phosphate transport protein (NPT1) was previously cloned and 
mapped to 6p21.3 using a somatic cell hybrid panel (Chong etaf. Genomics 18:355-359 (1993)). 
NPT1 maps 320 kb telomeric to the HFE gene (Figures 1 and 2). Two additional cDNAs were cloned 

1 5 which show appreciable homology to NPT1 (Figure 5). These genes. NPT3 and NPT4. mapped 1 .5 
megabases and 1.3 megabases centromeric to the NPT1 gene (Figure 1). Like NPT1, the gene 
products of NPT3 and NPT4 were extremely hydrophobic, which may reflect a membrane location. 
Both proteins gave hydrophllidty profiles which were indistinguishable from NPT1 in this study (data not 
shown). Northern blot analysis indicated that the two genes have different patterns of expression 

20 (Figure 6C). NPT3 vras expressed at high levels as a 7.2 kb transcript predominately in muscle and 
heart. Lesser amount of the mRNA were also found in brain, placenta, lung, liver and pancreas. 
RT-PCR analysis indicated that expression of the proper size PGR fragment for NPT3 was clearly 
absent in fetal brain, bone marrow and small intestine (Figure 60). A smaller size fragment was 
detectable in all tissues with the exception of the liver, which may represent evidence for alternative 

25 splidng. Although expression v/as apparently absent from the kidney by northern blot analysis, it was 
detectable by RT-PCR. Expression was also noted in the mammary gland, spleen and testis. NPT4, 
on the other hand, was expressed only in the liver and the kidney as a smear of transcripts 
approximately 2.6 - 1 .7 kb (Figure 6C). RT-PCR confirmed these results, although a small amount of 
the proper size PCR fragment was also found in the small intestine and testis (Figure 6D). Other 

30 ^issues showed amplification, but the fragments were of larger and smaller size than that produced by 
the cDNA 22E positive control. Hence, these two genes which apparently have the structural 
characteristics of a sodium phosphate transporter, appeared to be under the control of different 
regulatory mechanism that lead to differential patterns of expression. 

2. Seguencinq of 235 kb from a Homozygous Ancestral (Aff ected) Individual 

35 In these studies the entire genomic sequence was determined from an HH affected 

individual for a region corresponding to a 235,033 bp region surrounding the HFE gene between the 
flanking markers D6S2238 and D6S2241 . The sequence was derived from a human lymphoblastoid 
celt line, HC14, that is homozygous for the ancestral HH mutation and region. The sequence from the 
ancestral chromosome (Figure 9) was compared to the sequence of the region in an unaffected 

40 individual (Rgure 8) disclosed in copending U.S.S.N. 08/724,394 to Wentify polymorphic sites. A 
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subset of the polymorphic alleles so defined were further studied to determine their frequency in a 
collection of random individuals. 

The cell line HC14 was deposited with the ATCC om June 25, 1997, and is designated 
ATCCCRL-12371. 

a. Cosmid Library Screening 

The strategy and methodology for sequencing the genomic DNA for the affected 
individual was essentially as described in copending U. S.S.N. 08/724.394. hereby incorporated by 
reference in its entirety. Basically, a cosmid library was constructed using high molecular weight DNA 
from HC14 cells. The library was constructed in the supercos vector (Stratagene, La Jolla, CA). 
Colonies were replicated onto Biotrans nylon filters (ICN) using standard techniques. Probes from 
genomic subclones used in the generation of the sequence of the unaffected sequence disclosed in 
08/724.394 were isolated by gel electrophoresis and electroporation. Subclones were chosen at a 
spacing of approximately 20 kb throughout the 235 kb region. The DNA was labeled by incorporation 
of 32P dCTP by the random primer labeling approach. Positively hybridizing clones were isolated to 
purity by a secondary screening step, Cosmid insert ends were sequenced to determine whether full 
coverage had been obtained, and which clones formed a minimal path of cosmids through the 235 kb 
region. 

b. Sample Sequencing 

A minimal set of cosmid clones chosen to cover the 235 kb region were prepped with 
the QIagen Maxi-Prep system. Ten micrograms of DNA from each cosmid preparation were sonicated 
in a Heat Systems Sonicator XL and end-repaired with Klenow (USB) and T4 DNA polymerase (USB). 
The sheared fragments were size selected between three to four kilobases on a 0.7% agarose gel and 
then ligated to BstXI linkers (Invitrogen). The ligations were gel purified on a 0.7% agarose gel and 
cloned into a pSP72 derivative plasmid vector. The resulting plasmids were transformed into 
electrocompetent DH5a cells and plated on LB-carbenicillin plates. A sufficient number of colonies 
was picked to achieve 15-fold clone coverage. The appropriate number of colonies was calculated by 
the following equation to generate a single-fold sequence coverage: Number of colonies = size of 
bacterial clone (in kb)/average sequence read length (0.4 kb). These colonies were prepped in the 
96-well QIagen REAL, and the 5' to 3* DNA Prep Kit, and AGCT end-sequenced with oligo ^4AP1 using 
standard ABI Dye Terminator protocols. MAPI was CGTTAG/\ACGCGGCTACAAT. 

c. Genomic Sequencing 

The MAPI sequences from the cosmid clones HC182, HC187, HC189. HC195, 
HC199. HC200. HC201 , HC206. HC207. and HC212 were assembled Into contigs vwth the Staden 
package (available from Roger Staden, MRC). A minimal set of 3 kb clones was selected for 
sequencing with oligo labeled MAP2 that sits on the opposite end of the plasmid vector. The sequence 
of MAP2 was GCCGATTCATTAATGCAGGT. The MAP2 sequences were entered into the Staden 
database in conjunction with the MAPI sequences to generate a tiling path of 3 kb clones across the 
region. The plasmid 3 kb libraries were concun-entiy transformed In 96 well format into pox38UR 
(available from C. Martin, Lawrence Berkeley Laboratories). The transformants were subsequently 
mated with JGM (Strathman et al. P.NA?. 88:1247-1250 (1991) In 96 well format. All matings of the 
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3 kb clones within the tiling path were streaked on LB-carbeniciilin-kanamycin plates and a random 
selection of 12 colonies per 3 kb clone was propped in the AGCT system. The oligos -21: 
CTGTAAAACGACGGCCAGTC, and REV: GCAGGAAACAGCTATGACC were used to sequence off 
both ends of the transposon. Each 3 kb clone was assembled in conjunction with the end sequence 
5 information from all cosmid clones in the region. 

In some regions, the coverage of the genomic sequence by cosmids was incomplete. 
Any gaps in the sequence were filled by using standard PGR techniques to amplify genomic DNA in 
those regions and standard ABI dye terminator chemistry to sequence the amplification products, 
d. Identification of Polvmo rohfc Sites 

1 0 The assembled sequence of the cosmid clones in connection with the PGR amplified 

genomic DNA was compared to the genomic sequence of the unaffected indivklual using the FASTA 
algorithm. Numeric values were assigned to the sequenced regions of 1 to 235,303, wherein base 1 
refers to the first C In the OA repeat of D6S2238 and base 235,303 is the last T in the GT repeat of 
D6S2241 of the unaffected sequence (Figure 8). Table 1 lists the differences between the two 

1 5 compared sequences. Note that previously disclosed (Feder et al., Nature Genetics 1 3:399-408 
(1996)) polymorphic sites D6S2238 (base 1), D6S2241 (base 235,032), 24d1 (base 41316), and 
D6S2239 (base 84841) are not included in the list of new polymorphisms, although they are provided 
for reference in a footnote to the Table and were observed in the ancestral sequence. In the Table, a 
single base change such as C-T refers to a C in the unaffected sequence at the indicated base position 

20 that occurred as a T in the corresponding position in the affected sequence. Similarly, an insertion of 
one or more bases, such as TTT in the affected sequence, is represented as "TTT INS" between the 
indicated bases of the unaffected sequence. A deletion of one or more bases occurring in the affected 
sequence, such as AAA DEL. is represented as the deletion of the indicated bases in the unaffected 
sequence. 

25 e. Characterization of Rare Polymorphisms 

In this study about 100 of ths polymorphisms of Table 1 were arbitrarily chosen for 
further characterization. Allele frequencies in the general population were estimated by OLA analysis 
using a population of random DNAs (the "CEPH" collection, J. Dausset et al.. Genomics 6(3):575-577 
(1990)). These results are provided in Table 2. 

30 Ona single base pair difference, occurring at base 35983 and designated 

0182. 1 G7T/C (an A to G change on the opposite strand) was present in the ancestral chromosome 
and rare in the random DNAs. This change occurred in a noncoding region of the hemochromatosis 
gene near exon 7 approximately 5.3 kb from the 24d1 (Oys282Tyr) mutation. OLA was used to 
genotype 90 hemochromatosis patients for the CI 82.1G7T/C base pair change. The frequency for C 

35 occurring at this position in the patients was 79.4% as compared to 5% in the random DNAs. 

Eighty-five of the 90 patients assayed contained kientical 24dl and C182.1G7T/C genotypes. Four of 
the remaining 5 patients were homozygous at 24d1 and heterozygous at C182.1G7T/C; one was 
heterozygous at 24d1 and homozygous at 01 82.1 G7T/C. The primers used for this analysis were as 
follows. 



40 
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PGR primers for detection: 

1 82.1 G7.F S'-GCATCAGCGATTAACTTCTAC -3' 

182.1 G7.R 5 -TTGCATTGTGGTGAAATCAGGG -3' 
For the detection assay, the biotinylated primers used were as follows. 

1 82.1 G7.C 5' (b)CTGAGTAATTGTTTAAGGTGC -3* 

182.1G7.T 5* (b)CTGAGTAATTGTTTAAGGTGT -3' 
The phosphorylated digoxigenin-iabeled primer used was: 

1 82.1 G7.D 5' (p)AGAAGAGATAGATATGGTGG -3* 

A further rare single base pair change was detected at 61 .465bp. The inheritance 
pattern of this potymorphism, C195.1H5Cn" (a G to A change on the opposite strand), is identical to 
that of 24d1 . The frequency of T occurring at that position (CI 95.1 H5T) observed in a set of 76 
patients was 78.5% as compared to 5% in random individuals. 



PGR primers for detection: 

1 951 H5.3F 5 -GAATGTGACCGTCCCATGAG-3* 

1 951 H5.3R 5 -CAACTGAATATGCAG AAAAAAGTACACG-3* 
For the detection assay, the biotinylated primers used were: 

1 951H5.3.4 5' (b)AGTAGCTGGGACTCACGGTGT-3' 

1 957H5.3.5 5' (b)AGTAGCTGGGACTCACGGTGC-3' 
The phosphorylated digoxigenin-iabeled primer used was: 

1951H5.3.6 5* (p)GCGCCACCACTCCCAGCTCAT-3' 

These rare alleles are thus preferred sun-ogate markers for 24d1 and are especially 
useful in screening assays for the likely presence of 24d1 and/or 24d2. 

All publications, patents, and patent applications cited herein are hereby incorporated 
by reference in their entirety. 
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WHAT IS CLAIMED IS : 

1 . An oligonucleotide comprising at least 8 to about 100 consecutive bases from the 
sequence of Figure 9. or the complement of the sequence, wherein the at least 8 to about 1 00 
consecutive bases includes at least one polymorphic site of Table 1 . 

2. The oligonucleotide of claim 1 , wherein the polymorphic site is selected from the 
group consisting of base 35983 or base 61465. 

3. An oligonucleotide pair selected from the sequence of Figure 9 or its complement for 
amplification of a polymorphic site of Table 1 . 

4. An isolated nucleic acid molecule comprising about 1 00 consecutive bases to about 
235 kb substantially identical to the sequence of Figure 9, wherein the DNA molecule comprises at 
least one polymorphic site of Table 1 . 

5. The isolated nucleic acid molecule of claim 4, wherein the polymorphic site is selected 
from the group consisting of base 35983 or base 61465. 

6. The isolated nucleic acid molecule of claim 4, wherein the nucleic acid is selected 
from the group consisting of cDNA, RNA, or genomic DNA. 

7. A polypeptide encoded by the nucleic acid molecule of claim 4. 

8. An antibody which specifically recognizes the polypeptide of claim 7. 

9. A method to determine the presence or absence of the common hereditary 
hemochromatosis (HFE) gene mutation in an Individual comprising: 

providing DNA or RNA from the Indtvidua!; and 

assessing the DNA or RNA for the presence or absence of a haplotype of Table 1 , 
wherein, as a result, the absence of a haplotype of Table 1 indicates the likely absence of the 
HFE gene mutation in the genome of the Individual and the presence of the haplotype indicates the 
likely presence of the HFE gene mutation in the genome of the individual. 

1 0. The method of claim 9, wherein the method further comprises assessing the RNA or 
DNA for the presence of at least one of the polymorphisms 24d1 » 24d2, HHP-1 , HHP-19. or HHP-29; 
or microsatellite repeat alleles 19D9:205, 1884:235, 1 A2:239. 1E4:271 . 24E2:245. 288:206. 3321- 
1:98, 4073-1:182. 4440-1:180, 4440-2:139, 731-1:177, 5091-1:148. 3216-1:221, 4072-2:170, 950- 
1:142, 950-2:164, 950-3:165, 950-4:128, 950-6:151. 950-8:137, 63-1:151, 63-2:113, 63-3:169.65- 
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6 1:206. 65-2:159. 68-1:167.241-5:108,241-29:113, 373-8:151, 373-29:113. D6S258:19^ 

7 D6S105:124, D6S306:238, D6S464:206, or D6S1G01:180. 

1 11. The method of claim 9, wherein the haplotype comprises at least two polymorphic 

2 sites of Table 1 . 

1 1 2. The method of claim 1 1 , wherein one of the at least two polymorphic sites of Table 1 

2 is at base 35983 or 6 1 465. 

1 13. The method of claim 1 1 , wherein the haplotype comprises at least three polymorphic 

2 sites of Table 1. 

1 14. A method to determine the presence or absence of the common hereditary 

2 hemochromatosis (HFE) gene mutation in an individual comprising: 

3 providing DNA or RNA from the individual; and 

4 assessing the DNA or RNA for the presence or absence of a genotype defined by a 

5 polymorphic allele of Table 1 , 

6 wherein, as a result, the absence of a genotype defined by a polymorphic allele of Table 1 

7 indicates the likely absence of the HFE gene mutation in the genome of the individual and the 

8 presence of the genotype indicates the likely presence of the HFE gene mutation in the genome of the 

9 individual. 

1 15. The method of claim 1 5. wherein the polymorphic allele occurs in less than about 50% 

2 of a random population of individuals. 

1 16, The method of claim 15, wherein the polymorphic allele occurs in less than about 25% 

2 of a random population of individuals. 

1 17. The method of claim 15. wherein the polymorphic allele occurs in less than about 5% 

2 of a random population of indivkiuals. 

1 18. The method of claim 15. wherein the genotype is C182.1G7C or C195.1H5T, 

1 19. A kit comprising one or more oligonucleotides of claim 1 . 

1 20. A kit comprising at least one oligonucleotide pair of claim 3. 



1 



21. 



A culture of lymphoblastoid cells having the designation ATCC CRL-12371 . 
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22. An isolated nucleic acid sequence comprising a sequence substantially identical to 

BTF1. 

23. The isolated nucleic acid sequence of claim 23, wherein the nucleic acid is cDNA. 

24. The polypeptide encoded by the isolated nucleic acid sequence of claim 23. 

25. A vector comprising the nucleic acid sequence of claim 23. 

26. A host cell stably transfected with the nucleic acid sequence of claim 23. 

27. An antibody that is specifically immunoreactive with the polypeptide of claim 24. 

28. An isolated nucleic acid sequence comprising a sequence substantially identical to 

BTF2. 

29. The isolated nucleic acid sequence of claim 28, wherein the nucleic acid is cDNA. 

30. The polypeptide encoded by the isolated nucleic acid sequence of claim 28. 

31 . A vector comprising the nucleic acid sequence of claim 28. 

32. A host cell stably transfected with the nucleic acid sequence of claim 28. 

33. An antibody that is specifically immunoreactive with the polypeptide of claim 30. 

34. An isolated nucleic acid sequence comprising a sequence substantially identical to 

BTF3. 

35. The isolated nucleic acid sequence of claim 34, wherein the nucleic acid is cDNA. 

36. The polypeptide encoded by the isolated nucleic acid sequence of claim 34. 

37. A vector comprising the nucleic acid sequence of claim 34. 

38. A host cell stably transfected with the nucleic acid sequence of claim 34. 

39. An antibody that is specifically immunoreactive with the polypeptide of claim 36. 
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40. An isolated nucleic acid sequence comprising a sequence substantially identical to 

BTF4. 

41 . The isolated nucleic acid sequence of claim 40. wherein the nucleic acid is cDNA. 

42. The polypeptide encoded by the isolated nucleic acid sequence of claim 40. 

43. A vector comprising the nucleic acid sequence of claim 40. 

44. A host ceil stably transfected with the nucleic acid sequence of claim 40. 

45. An antibody that Is specifically immunoreactive with the polypeptide of claim 42. 

46. An isolated nucleic acid sequence comprising a sequence substantially identical to 

BTF5. 

47. The isolated nucleic acid sequence of claim 46. wherein the nucleic acid is cDNA. 

48. The polypeptide encoded by the isolated nucleic acid sequence of claim 46. 

49. A vector comprising the nucleic acid sequence of claim 46. 

50. A host cell stably transfected with the nucleic acid sequence of claim 46. 

51 . An antilwdy that is specifically immunoreactivB with the polypeptide of claim 48. 

52. An isolated nucleic acid sequence comprising a sequence substantially identical to 

NTP-3. 

53. The isolated nucleic acid sequence of claim 52. wherein the nucleic acid is cDNA. 

54. The polypeptide encoded by the isolated nucleic acid sequence of claim 52. 

55. A vector comprising the nucleic acid sequence of claim 52. 

56. A host celt stably transfected with the nucleic acid sequence of claim 52. 

57. An antibody that Is spedficaliy immunoreactive with the polypeptide of claim 54. 
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58. An isolated nucleic acid sequence comprising a sequence substantially identical to 

NTP-4, 

59. The isolated nucleic acid sequence of claim 58, wherein the nu ;leic acid is cDNA. 

60. The polypeptide encoded by the isolated nucleic acid sequence of claim 58. 

61 . A vector comprising the nucleic acid sequence of claim 58. 

62. A host cell stably transfected with the nucleic acid sequence of claim 58. 

63. An antibody that is specifically immunoreactive with the polypeptide of claim 60. 

64. An isolated nucleic acid sequence comprising a sequence substantially Identical to 

RoRet. 

65. The isolated nucleic add sequence of claim 64, wherein the nucleic acid Is cDNA. 

66. The polypeptide encoded by the isolated nucleic acid sequence of claim 64. 

67. A vector comprising the nucleic acid sequence of claim 64. 

68. A host cell stably transfected with the nucleic acid sequence of claim 64. 

69. An antibody that is specifically immunoreactive v/ith the polypeptide of claim 66. 

70. An isolated nucleic acid sequence comprising at least 18 contiguous nucleotides 
substantially Identical to 18 contiguous nucleotides of BTF1 . 

71 . An isolated nucleic acid sequence comprising at least 1 8 contiguous nucleotides 
substantially identical to 1 8 contiguous nucleotides of BTF2. 

72. An isolated nucleic acid sequence comprising at least 1 8 contiguous nucleotides 
substantially identical to 1 8 contiguous nucleotides of BTF3. 

73. An isolated nucleic acid sequence comprising at least 1 8 contiguous nucleotides 
substantially identical to 18 contiguous nucleotides of BTF4. 

74. An isolated nucleic acid sequence comprising at least 1 8 contiguous nucleotides 
substantially identical to 18 contiguous nucleotides of BTF5. 
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75. An isolated nucleic add sequence connprising at least 18 contiguous nucleotides 
substantially identical to 1 8 contiguous nucleotides of NPT3. 

76. An isolated nucleic acid sequence comprising at least 18 contiguous nucleotides 
substantially identical to 18 contiguous nucleotides of NPT4. 

77. An isolated nucleic acid sequence comprising at least 1 8 contiguous nucleotides 
substantially identical to 18 contiguous nucleotides of RoRet. 
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5000 10000 15000 20000 25000 30000 35000 4 0000 45000 50000 55000 
GDNA57(H2A)< > *m/q 

nRNA-CDNA37 '"^^ Testicular HI „RNA-CDNA24 



yd37dll EST Pair yu98e02 ES rSlf CT. ^ -,ye9MlEST 

: X""'"' -yf6!f07ESTPolr 

yl77b02ESTPolr *^yd72hl2 EST Pair 



60000 65000 70000 75000 80000 85000 90000 95000 100000105000 110000 115000 

hT2 H3J,H2B h2oH3 H4 

H2A Psuedogene 



120000 125000 130000 135000 140000 145000 150000 155000 160000 165000 170000 I7S000 
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H3Suedo-Gene nRNA-CDNA 25/27 
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BT --MAVFPSSGLPRCL LTLILLQLPKLDSAPFDVIGPPEPILAWGEDAELPCRLSPN 

BTFl MESAAALHFSRPAS LLLLLLSLCALVSAQFIWGPTDPILATVGENTTLRCHLSPE 

BTF2 MEPAAALHFSLPASLLLLLLLLLLSLCALVSAQFTWGPANPILAMVGENTTLRCHLSPE 

BTF5 MKMASFLAFLLLNFR VCLLLLQLLMPHSAQFSVLGPSGPILAMVGEDADLPCHLFPT 

BTF3 MKMASSLAFLLLNFH VSLFLVQLLTPCSAQFSVLGPSGPILAMVGEDADLPCHLFPT 

BTF4 MKMASSLAFLLLNFH VSLLLVQLLTPCSAQFSVLGPSGPILAMVGEDADLPCHLFPT 



* ^ ★ *^ * ** * *1r* ir it it 



BT ASAEHLELRWFRKKVSPAVLVHRDGREQEAEQMPEYRGRATLVQDGIAKGRVALRIRGVR 

BTFl KNAEDMEVRWFRSQFSPAVFVYKGGRERTEEQMEEYRGRTTFVSKDISRGSVALVIHNIT 

BTF2 KNAEDMEVRWFRSQFSPAVFVYKGGRERTEEQMEEYRGRITFVSKDINRGSVALVIHNVT 

BTF5 MSAETMELKWVSSSLRQWNVYADGKEVEDRQSAPYRGRTSILRDGITAGKAALRIHNVT 

BTF3 MSAETMELRWVSSSLRQWNVYADGKEVEDRQSAPYRGRTSILRDGITAGKAALRIHNVT 

BTF4 MSAETMELKWVSSSLRQWNVYADGKEVEDRQSAPYRGRTSILRDGITAGKAALRIHNVT 
**.*..* ** *.* * ****_^ * * 

BT VSDDGEYTCFFREDGSYEEALVHLKVAALGSDPHISMQVQENGEICLECTSVGWYPEPQV 

BTFl AQENGTYRCYFQEGRSYDEAILHLWAGLGSKPLISMRGHEDGGIRLECISRGWYPKPLT 

BTF2 AQENGIYRCYFQEGRSYDEAILRLWAGLGSKPLIEIKAQEDGSIWLECISGGWYPEPLT 

BTF5 ASDSGKYLCYFQDGDFYEKALVELKVAALGSDLHVDVKGYKDGGIHLECRSTGWYPQPQI 

BTF3 ASDSGKYLCYFQDGDFYEKALVELKVAALGSDLHIEVKGYEDGGIHLECRSTGWYPQPQI 

BTF4 ASDSGKYLCYFODGDFYEKALVELKVAALGSNLHVEVKGYEDGGIHLECRSTGWYPQPQI 

★ « *.*,, *, * ***** . * * **♦ * * 

BT QWRTSKGEKFPSTSESRNPDEEGLFTVAASVIIRDTSTKNVSCYIQNLLLGQEKKVEISI 

BTFl VWRDPYGGVAPALKEVSMPDADGLFMVTTAVIIRDKSVRNMSCSINNTLLGQKKESVIFI 

BTF2 VWRDPYGEWPALKEVSIADADGLFMVTTAVIIRDKYVRNVSCSVNNTLLGQEKETVIFI 

BTF5 QWSNNKGENIPTVEAPWADGVGLYAVAASVIMRGSSGEGVSCTIRSSLLGLEKTASISI 

BT F3 KWS DTKGENI PAVEAP WADGVGL Y AVAAS VIMRGS S GGGVS CIIRNSLLGLE KTAS I S I 

BTF4 QWSNAKGENI PAVEAPWADGVGL YEVAASVIMRGGS GEGVS C 1 1 RNS LLGLE KTAS IS I 



BT PASSLPRLTPWIVAVAV ILMVLGLLTIGSIFFTWRLYNER 

BTFl PESFMPSVSPCAVALP IIWILMIPIAVCIYWINKLQKEKKILSGEK 

BTF2 PESFMPSASPWMVALAVILTASPWMVSMTVILAVFIIFMAVSICCIKKLQREKKILSGEK 

BTF5 ADPFFRSAQRWIAALAR TLPVLLLLLGGAGYFLWQQQEEKKTQFRKK 

BTF3 ADPFFRSAQPWIAALAG TLPISLLLLAGA5YFLWRQQKEKIALSRET 

BTF4 ADPFFRSAQPWIAALAG TLPILLLLLAGASYFLWRQQKEITALSSEI 



BT PRER RNEFS SKERLLEELKWKKATLHA 

BTFl EFERETREIALKELEKERVQKEEELQVKEKLQEELRWRRTFLHA 

BTF2 KVEQE EKE lAQQLQEELRWRRTFLHA 

BTF5 KREQELREMAWSTMKQEQS TRVKLLEELRWRSIQYASRGERHSAYNEWKKALF 

BTF3 EREREMKEMGYAATEQEIS LREKLQEELKWRKIQYMARGEKSLAYHEWKMALF 

BTF4 ESEQEMKEMGYAATEREIS LRESLQEELKRKKSST 



BT —VDVTLDPDTAHPHLFLYEDSKSVRLEDSRQK LPEKTERFDSWPCVLGRETFTSGR 

BTFl —VDWLDPDTAHPDLFLSEDRRSVRRCPFRHLGESVPDNPERFDSQPCVLGRESFASGK 

BTF2 — ADWLDPDTAHPELFLSEDRRSVRRGPYRQR VPDNPERFDSQPCVLGWESFASGK 

BTF5 KPADVILDPKTANPILLVSEDQRSVQRAKEPQD LPDNPERFNWHYCVLGCESFISGR 

BTF3 KPADVILDPDTANAILLVSEDQRSVQRAEEPRD LPDNPERFEWRYCVLGCENFTSGR 

BTF4 



BT HYWEVEVGDRTDWAIGVCRENVMKK-GFDPMTPENGFWAVELY-GNGYWALTPLRTPLPL 

BTFl HYWEVEVENVIEWTVGVCRDSVERK-GEVLLIPQNGFWTLEMH-KGQYRAVSSPDRILPL 

BTF2 H YWEVEVEKVMVWTVGVCRHSVERK-GEVLLI PQNGFWTLEMF- GMQYRALS S PER I LPL 

BTF5 HYWEVEVGDRKEWHIGVCSKNVQRK-GWVKMTPENGFWTMGLTDGNKYRTLTEPRTNLKL 

BTF3 HYWEVEVGDRKEWHiGVCSKNVERKKGtfVKMTPENGYWTMGLTDGNKYRALTEPRTNLKL 

BTF4 



Figure 3 (Pag© l of 2) 



wo 98/14466 



PCT/US97/17658 



A/162 

BT AGPPRRVGIFLDYESGDISFYNMNDGSDIYTFSNVTFSGPLRPFFCLWSSGKKPLTICPI 

BTFl KESLCRVGVFLDYEAGDVSFYNMRDRSHIYTCPRSAFSVPVRPFFRLGC-EDSPIFICPA 

BTF2 KESLCRVGVFLDYEAGDVSFYNMRDRSHIYTCPRSAFTVPVRPFFRLGS-DDSPIFICPA 

BTF5 PKPPKKVGVFLDYETGDISFYNAVDGSHIHTFLDV.SFSEALYPVFRILTLEPTALSICPA 

BTF3 PEPPRKVGIFLDYETGEISFYNATDGSHIYTFPHASFSEPLYPVFRILTLEPTALTICPI 

BTr4 

BT ADGPERVTVIANAQDLSKEIPLSPMGEESAPRDADTLHSKLIPrQPSQGAP 

BTFl LTGANGVTVP EEGLTLHRVGTHQSL 

STF2 LTGASGVMVP EEGLKLHRVGTHQSL 

BTF5 

BTF3 PKEVESSPDPDLVPDHSLETPLTPGLANESGEPQAEVTSLLLPAHPGAEVSPSATTNQNH 

BTF4 ^ 

BT 

BTFl 

BTF2 

BTF5 

BTF3 KLQARTEALY 

BTF4 
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cgacccacgcgtccgaacatggcgacctaggagaaagggaagaacaattttttctcctcttttgggaagg 
tttgcgtctagtagtgcctgtgcccctgggcagattggagagaagagggacgactggagaatcgtcgaga 
accagcggagaaaagaaaaagcaacgtttaattctagaaggcctcctgtccctgcctgctctgggtgctc 
atggaatcagctgctgccctgcacttctcccggccagcctccctcctcctcctcctcctcagcctgtgtg 
cactggtctcagcccagtttattgtcgtggggcccactgatcccatcttggccacggttggagaaaacac 
tacgttacgctgccatctgtcacccgagaaaaatgctgaggacatggaggtgcggtggttccggtctcag 
ttctcccccgcagtgtttgtgtataaaggtggcagagagagaacagaggagcagatggaggagtaccgag 
gaagaaccacctttgtgagcaaagacatcagcaggggcagcgtggccctggtcatacacaacatcacagc 
ccaggaaaacggcacctaccgctgttacttccaagaaggcaggtcctacgatgaggccatcctgcacctc 
gtagtggcaggactaggctctaagcccctcatttcaatgaggggccatgaagacgggggcatccggctgg 
agtgcatatctagagggtggtacccaaagcccctcacagtgtggagggacccctacggtggggttgcgcc 
tgccctgaaagaggtctccatgcctgatgcagacggcctcttcatggtcaccacggctgtgatcatcaga 
gacaagtctgtgaggaacatgtcctgctctatcaacaacaccctgctcggccagaagaaagaaagtgtca 
tttttattccagaatcctttatgcccagtgtgtctccctgtgcagtggccctgcctatcattgtggttat 
tctgatgatacccattgccgtatgcatctattggatcaacaaactccaaaaggaaaaaaagattctgtca 
ggggaaaaggagtttgaacgggaaacaagagaaattgctctaaaggaactggagaaagaacgtgtgcaaa 
aagaggaagaacttcaagtaaaagagaaacttcaagaagaattgcgatggagaagaacattcttacatgc 
tgttgatgtggtcctggatccagacaccgctcatcccgatctcttcctgtcagaggaccggagaagtgtg 
agaaggtgccccttcaggcacctaggggagagcgtgcctgacaacccagagagattcgacagtcagcctt 
gtgtcctaggccgggagagcttcgcttcagggaaacattactgggaggtggaggtggaaaacgtgattga 
gtggactgtgggggtctgtagagacagtgttgagaggaaaggggaggtcctgctgattcctcagaatggc 
ttctggaccttggagatgcataaagggcaataccgggccgtgtcctcccctgataggattctccctttga 
aggagtccctttgccgggtgggcgtcttcctggactatgaagctggagatgtctccttctacaacatgag 
ggacagatcgcacatctacacatgtccccgttcagccttttccgtgcctgtgaggcccttcttcaggrtg 
gggtgtgaggacagccccatcttcatctgccctgcactcacaggagccaatggggtcacggtgcctgaag 
agggcctgacacttcacagagtggggacccaccagagcctatagaatcaattccttggtctcacagccat 
gtagacaagccctggtcatctcagcagccaccgcacaacacccctggtggaagacacgccctcctcccct 
ctggtcacacaagagaacatcttccagctgcctctttcacacccactacagacctcagccccagttttct 
cctcctcactaggctgtgtttttagtagttcctttgcttgtaactatgggatgggatccaggcataggga 
actagttgttacacagctcccagccaagaagaaagtgtgagaagttgatgggcagcaaacctgctgttta 
acatcagggtgaccacattaagcccagtattccagttggcaccagaagatatggacttggaatgaggcct 
acagggttcaccaggatgtaagaggagagaggaatccacaggaccaccagagaggagagggaaccagata 
tgcagatcagagatagaggaagtggaaccagagagctgggagggaccaaggttgtaagggtggctaagtc 
ccaccataacagctaaggggacctgggagatgatggctcatttccacccagccccaggatttccagagcg 
cacatccacaggcctggacctgggatgaagatgaatgaagaacatggatgcacgtggatgtagtttggct 
caggtgtccctgcagttggcaaggagtcagtactcagtccctgagtgtggctgaaatttgaggtcctggc 
tgagccaaggagtaatggaccagatctacctcagtattcaagttcagtggggacaccagtggcttcaaac 
ttcctggtttcatgatatcttgagacgccttacaaatgatggaggattccaaagagtttttgtttatttg 
ggttaatatttgttggtatttatggcatttgagattgaaactaagaaatgttttaatttattacctttac 
aacatttatttacattacatacatacatttacaacatttattaatttatattaaaatagcatgaataagc 
caattataggttaatataagtagaatgtttgtgaaaaataagtatggtatccaaagcaaaataaatttta 
ttgtgaagtgtgaaaaaaaaaaaaaaaaaaaaaa 
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acgcgtccgcttcggaatgagagactcaaccataatagaaagaatggagaactattaaccaccattcttc 
agtgggctgtgattttcagaggggaatactaagaaatggttttccatactggaacccaaaggtaaagaca 
ctcaaggacagacatttttggcagagcatagatgaaaatggcaagttccctggctttccttctgctcaac 
tttcatgtctccctcttcttggtccagctgctcactccttgctcagctcagttttctgtgcttggaccct 
ctgggcccatcctggccatggtgggtgaagacgctgatctgccctgtcacctgttcccgaccatgagtgc 
agagaccatggagctgaggtgggtgagttccagcctaaggcaggtggtgaacgtgtatgcagatggaaag 
gaagtggaagacaggcagagtgcaccatatcgagggagaacttcgattctgcgggatggcatcactgcag 
ggaaggctgctctccgaatacacaacgtcacagcctctgacagtggaaagtacttgtgttatttccaaga 
tggtgacttctacgaaaaagccctggtggagctgaaggttgcagcattgggttctgatcttcacattgaa 
gtgaagggttatgaggatggagggatccatctggagtgcaggtccactggctggtacccccaaccccaaa 
taaagtggagcgacaccaagggagagaacatcccggctgtggaagcacctgtggttgcagatggagtggg 
cctgtatgcagtagcagcatctgtgatcatgagaggcagctctggtgggggtgtatcctgcatcatcaga 
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aattccctcctcggcctggaaaagacagccagcatatccatcgcagaccccttcttcaggagcgcccagc 
cctggatcgcggccctggcagggaccctgcctatctcgttgctgcttctcgcaggagccagttacttctt 
gtggagacaacagaaggaaaaaattgctctgtccagggagacagaaagagagcgagagatgaaagaaatg 
ggatacgctgcaacagagcaagaaataagcctaagagagaagctccaggaggaactcaagtggaggaaaa 
tccagtacatggctcgtggagagaagtctttggcctatcatgaatggaaaatggccctcttcaaacctgc 
ggatgtgattctggatccagacacggcaaacgccatcctccttgtttctgaggaccagaggagtgtgcag 
cgtgctgaagagccgcgggatctgccagacaaccctgagagatttgaatggcgttactgtgtccttggct 
gtgaaaacttcacatcagggagacattactgggaggtggaagtgggggacagaaaagagtggcatattgg 
ggtatgtagtaagaacgtggagaggaaaaaaggttgggtcaaaatgacaccggagaacggatactggact 
atgggcctgactgatgggaataagtatcgggctctcactgagcccagaaccaacctgaaacttcctgagc 
ctcctaggaaagtggggatcttcctggactatgagactggagagatctcgttctataatgccacagatgg 
atctcatatctacacctttccgcacgcctctttctctgagcctctatatcctgttttcagaattttgacc 
ttggagcccactgccctgaccatttgcccaataccaaaagaagtagagagttcccccgatcctgacctag 
tgcctgatcattccctggagacaccactgaccccgggcttagctaatgaaagtggggagcctcaggctga 
agtaacatctctgcttctccctgcccaccctggagctgaggtctccccttctgcaacaaccaatcagaac 
cataagctacaggcacgcactgaagcactttactgatattcattccattattccatatgacagttgtttt 
gagtttcgtaccaccttattgtccccttatacagataaggaaactggggtgcagaaaggtgaattaactt 
tacaaagtagacatgacaagtgaacagcagagctgggatctaaacagcaataactaacattaacagagaa 
tttaaaatgttcttagtgctgtgttataagctttggtggatgtcactcctttaatcctcacaacaccctg 
tcgggtagtcatattttgcaagtatggaagctgaggcagggcaacatgaagtaacttacataattcatac 
agtaatttgtgcagttgggagatgttcagccttagtccctggctaattgcctgttcttttccagcctgat 
tttttttcccacaggaagagcccacatgtagccctgaggtttccttcccaggacagctgcagggtagaga 
tcattttaagtgcttgtggagttgacatccctattgactctttcccagctgatatcagagacttagaccc 
agcactccttggattagctctgcagagtgtcttggttgagagaataacctcatagtaccaacatgacatg 
tgacttggaaagagactagaggccacacttgataaatcatggggcacagatatgttcccacccaacaaat 
gtgataagtgattgtgcagccagagccagccttccttcaatcaaggtttccaggcagagcaaatacccta 
gagattctctgtgatataggaaatttggatcaaggaagctaaaagaattacagggatgtttttaatccca 
ctatggactcagtctcctggaaataggtctgtccactcctggtcattggtggatgttaaacccatattcc 
tttcaactgctgcctgctagggaaaactgctcctcattatcatcactattattgctcaccactgtatccc 
ctctacttggcaagtggttgtcaagttctagttgttcaataaatgtgttaataatgaaaaaaaaaaa 



>CDNA23 

atttgctttctctttttcctttcttccggatgagaggctaagccataatagaaagaatggagaattattg 
attgaccgtctttattctgtgggctctgattctccaatgggaataccaagggatggttttccatactgga 
acccaaaggtaaagacactcaaggacagacatttttggcagagcatagatgaaaatggcaagttccctgg 
ctttccttctgctcaactttcatgtctccctcctcttggtccagctgctcactccttgctcagctcagtt 
ttctgtgcttggaccctctgggcccatcctggccatggtgggtgaagacgctgatctgccctgtcacctg 
ttcccgaccatgagtgcagagaccatggagctgaagtgggtaagttccagcctaaggcaggtggtgaacg 
tgtatgcagatggaaaggaagtggaagacaggcagagtgcaccgtatcgagggagaacttcgattctgcg 
ggatggcatcactgcagggaaggctgctctccgaatacacaacgtcacagcctctgacagtggaaagtac 
ttgtgttatttccaagatggtgacttctatgaaaaagccctggtggagctgaaggttgcagcactgggtt 
ctaatcttcacgtcgaagtgaagggttatgaggatggagggatccatctggagtgcaggtccaccggctg 
gtacccccaaccccaaatacagtggagcaacgccaagggagagaacatcccagctgtggaagcacctgtg 
gttgcagatggagtgggcctatatgaagtagcagcatctgtgatcatgagaggcggctccggggagggtg 
tatcctgcatcatcagaaattccctcctcggcctggaaaagacagccagcatttccatcgcagacccctt 
cttcaggagcgcccagccctggatcgcagccctggcagggaccctgcctatcttgctgctgcttctcgcc 
ggagccagttacttcttgtggagacaacagaaggaaataactgctctgtccagtgagatagaaagtgagc 
aagagatgaaagaaatgggatatgctgcaacagagcgggaaataagcctaagagagagcctccaggagga 
actcaagaggaaaaaatccagtacttgactcgtggagaggagtcttcgtccgataccaataagtcagcct 
gatgctctaatggaaaaatggccctcttcaagcctggtgaggaaatgcttcagatgaggctccaccttgt 
taaataaattggatgtatggaaaaatagactgcagaaaaggggaactcatttagctcacgagtggtcgag 
tgaagattgaaaattaacctctgagggccagcacagcagctcatgcctgtaatcctagcactttggaagg 
ctgaggagggcggatcacaaggtcaggagatcaagaccatcctggctaacacggtgaaaccccgtctcta 
ctaaaaatacaaaaaataaaaaattagccgggcatggtgacgggcacctgtagtcccagctactcgggag 
gctgaggcaggagaatggcatgaacccggaaggcagagcttgcagtgagccgagatcacgccactgcact 
ccagcctgggagacagagcgagactctgtctcaagaaaaaaaaaaaaaaaaaaaaa 
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>CDNA4 4 

ctgaagcttgcatgcctgcaggtcgacccacgcgtccgcggacgcgtgggcggacgcgtgggtttttcct 

ttcttccagaaggagatttaaccatagtagaaagaatggagaactattaactgccttccttctgtgggct 

gtgattttcagaggggaatgctaagaggtgattttcaatgttgggactcaaaggtgaagacactgaagga 

cagaatttttggcagaggaaagatcttcttcggtcaccatacttgagttagctctagggaagtggaggtt 

tccatttggaattctatagcttcttccaggtcatagtgtctgccccccaccttccagtatctcctgatat 

gcagcatgaatgaaaatggcaagtttcctggccttccttctgctcaactttcgtgtctgcctccttttgc 

ttcagctgctcatgcctcactcagctcagttttctgtgcttggaccctctgggcccatcctggccatggt 

gggtgaagacgctgatctgccctgtcacctgttcccgaccatgagtgcagagaccatggagctgaagtgg 

gtgagttccagcctaaggcaggtggtgaacgtgtatgcagatggaaaggaagtggaagacaggcagagtg 

caccgtatcgagggagaacttcgattctgcgggatggcatcactgcagggaaggrtgctctccgaataca 

caacgtcacagcctctgacagtggaaagtacttgtgttatttccaagatggtgarttctatgaaaaagcc 

ctggtggagctgaaggttgcagcactgggttctgatcttcacgttgatgtgaagggttacaaggatggag 

ggatccatctggagtgcaggtccactggctggtacccccaaccccaaatacagtggagcaacaacaaggg 

agagaacatcccgactgtggaagcacctgtggttgcagacggagtgggcctgtatgcagtagcagcatct 

gtgatcatgagaggcagctctggggagggtgtatcctgtaccatcagaagttccctcctcggcctggaaa 

agacagccagcatttccatcgcagaccccttcttcaggagcgcccagaggtggatcgccgccctggcacg 

gaccctgcctgtcttgctgctgcttcttgggggagccggttacttcctgtggcaacagcaggaggaaaaa 

aagactcagttcagaaagaaaaagagagagcaagagttgagagaaatggcatggagcacaatgaagcaag 

aacaaagcacaagagtgaagctcctggaggaactcagatggagaagtatccagtatgcatctcggggaga 

gagacattcagcctataatgaatggaaaaaggccctcttcaagcctgcggatgtgattctggatccaaaa 

acagcaaaccccatcctccttgtttctgaggaccagaggagtgtgcagcgtgccaaggagccccaggatc 

tgccagacaaccctgagagatttaattggcattattgtgttctcggctgtgagagcttcatatcagggag 

acattactgggaggtggaggtaggggacaggaaagagtggcatataggggtgtgcagtaagaatgtgcag 

agaaaaggctgggtcaaaatgacacctgagaatggattctggactatggggctgactgatgggaataagt 

atcggactctaactgagcccagaaccaacctgaaacttcctaagccccctaagaaagtgggggtcttcct 

ggactatgagactggagatatctcattctacaatgctgtggatggatcgcatattcatactttcctggac 

gtctccttctctgaggctctatatcctgttttcagaattttgaccttggagcccacggccctgagtattt 

gtccagcgtgaaaagaagaagagagttcctccaattctgaccgagtgctgatcattccctagagacacca 

gtaaccccgggcttagctaacgaaagtggggagcctcaggctgaagtaacttttctctgcttctccctgc 

ccagctcagagctgagggcctccccctccacagcaaccaatcacaaccataaagctacaagcacgcactg 

aagcactttactgatactcattcaattattcatatgacagttgtttgagtttggtaccatcttattttcc 

ccttatacagataaggaaactggggtgcagaaaagtgaattgactacaaagtagacatgactagttaaca 

acacagctgggatctaaacagcaataactaacattaatggagaacttaaaatgctctgagtgctgtgtta 

tgagctttggtggatgtcactcctttaatcctcgcaacaccctgtcgggtagtctcatttagcaagtatg 

gaagttgaggcagggcaacattaagcaacttacataactcatgcagtaatttctgcagttgggagatgtt 

cagcttcagtccccggccctatggccgttcttttccaccctgtttcttcccccataggaagaacccacct 

gtagccctgaggttcttttcccaggatggctccaggataaggatcactgtaggtggttgtggagttgaca 

cccctgttgactccttcccagctgattgtcagagccttagacccagcacgccttggattagctttgcaga 

gtgtcttggttgagagaataacctcaccgtacccacatgacacgtgatttggaaagagactagaggccac 

acttgataaatcatggggaacagatgtgttccacccaacaaatgtgataagtgatcatgcagccagagcc 

agccttccttcaatcaaggtttccaggcagagcaaataccctagagattttctgtgatataggaaatttg 

gatgaagggagctagaagaaatacagggatttttttttttttttaagatggagtcttactctgttgctag 

gctggagtgcagtggtgcgatctcagctccctgcaacctccacctcctgggttcaaacaattctcctgcc 

tcagcctcccgagtactgggaatataggtgcacgccaccacacccaacaaatttttgtacttttagtaca 

gatgagggttcactatgttggccaggatggtctcgatctcttgacctcatgatccacccacctcggtctc 

ccaaagtgctgggattacaggcttgagccaccgggtgaccggcttacagggatatttttaatcccgttat 

ggactctgtctccaggagaggggtctatccacccctgctcattggtggatgttaaaccaatattcctttc 

aactgctgcctgctagggaaaaactactcctcattatcatcattattattgctctccactgtatcccctc 

tacctggcatgtgcttgtcaagttctagttgttcaataaatttgttaataatgctgaaaaaaaaaaaaaa 
aaaaaaaaaaaaaaaaaaaaaa 



>CDNA32 

agagaacaggtcccagataccgagtccgcaaccccaaacatcgcgattaataggaggcctctggtctctg 
cctgccctgggtgctcatggaaccagctgctgctctgcacttctccctgccagcctccctcctcctcctc 
ctgctcctcctccttctcagcctgtgtgcactggtctcagcccagtttactgtcgtggggccagctaatc 
ccatcctggccatggtgggagaaaacactacattacgctgccatctgtcacccgagaaaaatgctgagga 
catggaggtgcggtggttccggtctcagttctcccccgcagtgtttgtgtataagggtgggagagagaga 
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acagaggagcagatggaggagtaccggggaagaatcacctttgtgagcaaagacatcaacaggggcagcq 

tggccctggtcatacataacgtcacagcccaggagaatgggatctaccgctgttacttccaagaaggcag 

gtcctacgatgaggccatcctacgcctcgtggtggcaggccttgggtctaagcccctcattgaaatcaaq 

gcccaagaggatgggagcatctggctggagtgcatatctggagggtggtacccagagcccctcacagtgt 

ggagggacccctacggtgaggttgtgcccgccctgaaggaggtttccatcgctgatgctgacggcctctt 

catggtcaccacagctgtgatcatcagagacaagtatgtgaggaatgtgtcctgctctgtcaacaacacc 

ctgctcggccaggagaaggaaactgtcatttttattccagaatcctttatgcccagcgcatctccctgga 

tggtggccctagctgtcatcctgaccgcatctccctggatggtgtccatgactgtcatcctggctgtttt 

catcatcttcatggctgtcagcatctgttgcatcaagaaacttcaaagggaaaaaaagattctgtcagga 

gaaaagaaagttgaacaagaggaaaaagaaattgcacagcaacttcaagaagaattgcgatggagaagaa 

cattcttacatgctgctgatgtggtcctggatccagacaccgctcatcccgagctcttcctgtcagagga 

ccggagaagtgtgaggcggggcccctacaggcagagagtgcctgacaacccagagagattcgacagtcaq 

ccttgtgtcctgggatgggagagcttcgcctcagggaaacattactgggaggtggaggtggaaaacgtga 

tggtgtggactgtgggggtctgcagacacagtgttgagaggaaaggggaggtcctgctgattcctcagaa 

tggcttctggaccctggagatgtttggaaaccaataccgggccctgtcctcccctgagaggattctccct 

ttgaaggagtccctttgccgggtgggcgtcttcctggactatgaagctggagatgtctccttctacaaca 

tgagggacagatcacacatctacacatgtccccgttcagcctttactgtgcctgtgaggcccttcttcaq 

gttagggtctgatgacagccccatcttcatctgccctgcactcacaggagccagtggggtcatggtgcct 

gaagagggcctgaaacttcacagagtggggacccaccagagcctatagaatcaattccttggactcacag 

ccatgcagataagccctggccatctcagcagccaccgcacaacccccctaatgaaagacacgccctcctc 

ccctctggtcacgtaagagaacatcttccagctgcctttttcacacccactccagccctctgccccagtt 

ttctcctcctcactagtctgtggctttagtagttcctttgcttgtaattatgggatgggatccaggcata 

gggaactagttgtttcatagctcccagtcaaaaagaaagtgagagaagctgttgggcagtgaacctactg 

tttaaaatcaggataaccacattaagcccaatatgccagttggcaccagatgctgtggacttggaatgag 

gccaacagggttcaccaggatgagagaggagagaggaatccacaggaccaccagaagggagagggaacca 

gatatgcagatcagagatagaggaagtggaaccagagagctgggagggaccaaggttgtaaggatggcta 

agtcccaccataagagctaaagggtcctgggagatgatggctcatttccacccaaccccaggatttccac 

agcacacacccacaggcctggacctgggatgaagatgaatgaagaacatggactcatgtggatgtggttt 

ggctcagatgtccctgcaataaacaaggggtcagtacttagtccctgagtgtggttgaggtttgaggtcc 

tggtcgagcagggcagtactggaccaggtctacgtcagcattcaggttcaatggggacaccagtggcttc 

aaacttcctgatctaattatgtttttagacacttagaagttattgaggactttaaagagcttttgtttat 

ttgggttaatatttatgacatttgacattgaaacaaaaatttaaaatgttatcttttaatttatgttaaa 

atagcattaataaatcagttataggttaatgtagataggatgttttgtgaaaaagcaatctattgtgtcc 

aaataaaaaaaacaaaaagtgtgacactggttaactttttccagatctcatgtctggcttaataagagat 

atttgtattatcatatctgcctttgtattaaacctattggtatatcataggtcatgttagctcaaaaaaa 

ctttactgcacactactgagagaatgagatgaaaaacgattaatgtttcattattattattgtgaaaata 

ttattaacactggggactccttaagagtacatcagagttctctctaggaatcccaaaaccacattttgaa 

actagaatagtggatcctggaagttaatccatgtgctggttaattttagatgtcaacctggtgtttccag 

aagagattggcaagtgagtcagtgggaaattctctccttctgttggctgggtgcccaatacaacaaaaag 

gcagaggaaaggcaaattcttctctcctctggagctgagacactcttcttcttctgcccttggacatcag 

aactcctggctctccggcctttgaacttcaggacttgtaccaggaggccctgggttctcaggcctttggc 

tttggactgagagttacacaatcagcttccctggttctgaggctttcagacttaaactgagccatgctac 

cagcatcccagggtctccagcctacagatgagctgttgtgcgatttcttagcctccataatcacatgagc 

caatctccttaataaatgcctgctcatagatctgtatctacatctatatctgtatgtgcatctatatcta 

tgcctatatctatatctatatcatattgattttgtctctctggagaaccctgactaataaaatgagqcat 
ctaaaaaaaaaaaaaaaa 



CDNA27> 

gacccacgcgtccgaaaagctatggcctcaaccaccagcaccaagaagatgatggaggaagccacctgct 
ccatctgcctgagcctgatgacgaacccagtaagcatcaactgtggacacagctactgccacttgtgtat 
aacagacttctttaaaaacccaagccaaaagcaactgaggcaggagacattctgctgtccccagtgtcgg 
gctccatttcatatggatagcctccgacccaacaagcagctgggaagcctcattgaagccctcaaagaga 
cggatcaagaaatgtcatgtgaggaacacggagagcagttccacctgttctgcgaagacgaggggcagct 
catctgctggcgctgtgagcgggcaccacagcacaaagggcacaccacagctcttgttgaagacgtatgc 
cagggctacaaggaaaagctccagaaagctgtgacaaaactgaagcaacttgaagacagatgtacggagc 
agaagctgtccacagcaatgcgaataactaaatggaaagagaaggtacagattcagagacaaaaaatccg 
gtctgactttaagaatctccagtgtttcctacatgaggaagagaagtcttatctctggaggctggagaaa 
gaagaacaacagactctgagtagactgagggactatgaggctggtctggggctgaagagcaatgaactca 
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agagccacatcctggaactggaggaaaaatgtcagggctcagcccagaaattgctgcagaatgtgaatga 
cactttgagcaggagttgggctgtgaagctggaaacatcagaggctgtctccttggaacttcatactatg 
tgcaatgtttccaagctttacttcgatgtgaagaaaatgttaaggagtcatcaagttagtgtgactctgg 
atccagatacagctcatcacgaactaattctctctgaggatcggagacaagtgactcgtggatacaccca 
ggagaatcaggacacatcttccaggagatttactgccttcccctgtgtcttgggttgtgaaggcttcacc 
tcaggaagacgttactttgaagtggatgttggcgaaggaaccggatgggatttaggagtttgtatggaaa 
atgtgcagaggggcactggcatgaagcaagagcctcagtctggattctggaccctcaggctgtgcaaaaa 
gaaaggctatgtagcacttacttctcccccaacttcccttcatctgcatgagcagcccctgcttgtggga 
atttttctggactatgaggccggagttgtatccttttataacgggaatactggctgccacatctttactt 
tcccgaaggcttccttctctgatactctccggccctatttccaggtttatcaatattctcctttgtttct 
gcctcccccaggtgactaaggaaaagagcagaagctccttggtttaaccagcacagagaaaataatataa 
atcccataagggcagacgtttggtctgttttcttcgctgtcatttccttagtagttagactagtgctgag 
attttagtggatatataattgatttatgttgaatatatggacttagcaactaaaaataccacagatggtt 
aacctggactggggcaaagcaagataatagtgatgatcgtatgttgctgtctccatccgtctttaatggg 
tcagggctttgatttccaagggtcttcaggtgatgagtaggggtacccacaagtcagaaggtctgcgttc 
tcctagtttgtttgctgccatttgaactcatgtagggaatgaaagaaagctgcaattatccgccaactgc 
atttaaaacaaaacaaaacagaaaaatcaaaataacattgactcttccaaccactgacatgttgtttaat 
aatctaagcggcagtcctggaggctaccagacttactgagttctacctgagaaacagccaagcaaagtgt 
gagagaagggttaagactggcttacaatgagatgcttcaaatgaaaagggaattatgagtaaaattgaac 
tttgatgggggattcagttctggaaaagaatttggtattttccagtctgctaggaccaattaccttgaaa 
tattttaaaatctcagtaaatagttattgctgaaatggctgttggcagttcttattatgattcagagaag 
agcaaatagaccttaacttcattttgaaaaagaccaaattaccatacccgagtgagtaatgacaggacta 
caactaaaacataaacaacattaatgatgaccataaaaagtcacaaaattgctaaatgttataatttaga 
gttgacataaaaattgatggccaggcatggtggctcacgcctgtaatcccagaactatgtgaggctgagg 
caggtggatcacttgaggtcaggagttcaacaccagcctggccaacatggtgaaaccctgtctctactaa 
aaatacaaaaattagccgggcatggtggtaggggcctgtaacccagctactcgtgaggccaaggcaggag 
aattgcttgagcctgcagcagctgcagtaagccaagatcatgctgtgcctcaaggaaaaaaaaaattaat 
gtttactgatatttgttgaagtcctacaacatcacctctgagaataggagaaatgaagcaacagttgtgt 
ctagatgtcagaggcatggctgggcctccatctctgcctaagggagatataaaagagttcaaactattgc 
ccatgttccccagggtcagaagttctaattatgatgatagaggctgggttgtaagtagtaagtgaagggt 

agcagaatatgccatctttggcataagaagtattttgagttgaagacaattgagaaaaaaaaaaaaaaaa 
aa 



>CDNA22B 

ggacagaaaactccctccttttccaagttagccttatagtctagggcttaaaatactggtttaatggtga 
aggtaagtgcttttcttctttttgggtagaaggattattactaacttaccaaaggtccattaaggggagg 
gaacagttttaggagaagtcagagaaaagacattaacagcaacataaggatctccatctggtaatattgc 
ctaattccaaaatgaagagactctctgaaaaagataactgattcaatgaagaccctagggcaaggcttga 
gaagccactggtaccaatggacactgtggacaatggtcatttctccaaggacgctataaaagactgtcgt 
agtaaaagagattcagggcacagggaaactccaccacaaagcgtggtaccatttcccacagaag.ctaaat 
ggacgggaagcctgccaccaggaaaggtccagatttctgttcattacgctatgggctggctcttatcatg 
cacttctcaaacttcaccatgataacgcagcgtgtgagtctgagcattgcgatcatcgccatggtgaaca 
ccactcagcagcaaggtctatctaatgcctccactgaggggcctgttgcagatgccttcaataactccag 
catatccatcaaggaatttgatacaaaggcctctgtgtatcaatggagcccagaaactcagggtatcatc 
tttagctccatcaactatgggataatactgactctgatcccaagtggatatttagcagggatatttggag 
caaaaaaaatgcttggtgctggtttgctgatctcttcccttctcaccctctttacaccactggctgctga 
cttcggagtgattttggtcatcatggttcggacagtccagggcatggcccagggaatggcatggacaggt 
cagtttactatttgggcaaagtgggctcctccacttgaacgaagcaagctcaccaccattgcaggatcag 
ggtcagcatttggatccttcatcatcctctgtgtggggggactaatctcacaggccttgagctggccttt 
tatcttctacatctttggtagcactggctgtgtctgctgtctcctatggttcacagtgatttatgatgac 
cccatgcatcacccgtgcataagtgttagggaaaaggagcacatcctgtcctcactggctcaacagccca 
gttctcctggacgagctgtccccataaaggcgatggtcacatgcctaccactttgggccattttcctggg 
ttttttcagccatttctggttatgcaccatcatcctaacatacctaccaacgtatatcagtactctgctc 
catgttaacatcagagatagtggagttctgtcctccctgccttttattgctgctgcaagctgtacaattt 
taggaggtcagctggcagatttccttttgtccaggaatcttctcagattgatcactgtgcgaaagctctt 
ttcatctcttgatatgcaagtttcctcatgggaatctcaaggggatttgggctcatcgcaggaatcatct 
cttccactgccactggattcctcatcagtcaggattttgagtctggttggaggaatgtctttttcctgtc 
tgctgcagtcaacatgtttggcctggtcttttacctcacgtttggacaagcagaacttcaagactgggcc 
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aaagagaggacccttacccgcctctgaggacataaagttacaaacttaaatgtggtactgagcatgaact 

ttttaaacattttttacttctctccatattcctgaccatagactcagcagttcttaactctggctgtgtg 

ttagtcttccctggggagcctttataagacactgatacttgggacccactccagagattctgaatgaatt 

ggtctggggtggaacccagatactactaatttttagatactccttagaggtttctagcatgcgcccgggg 

ttgacaacagctggacaaacttgaaaagtcaattcatgtggcctttgaattttcctcattggaaagtact 

aaataaataaaaattcatgtgaaaatgatcactgataaatatcttcatggtggggcaggttattggatgc 

agagaagatctgctcggaattgtagccatatgttacagatctcagcaccgatcagaactgtaaaqctata 

atccccagaattaaagtttttattattttttatacattgtaaaacatagacgtttatttatgtgattaaa 
ttctattaaaatttacatgctaaaataaaaaaaaaaaaaaa 

>CDNA22E 

acgcgtccgcccacgcgtccgcccacgcgtccggtcggggccagagcgcaggtgtacctggcggccgtgc 
tggagcacctgaccgccgagatcctggagctggctggcaacccggcccgcgacaagaagacccgcatcat 
cctgcgccacctgtagctggccattcgcaacggcgaggagcttaacaagctgctgggcgaagtcaccatc 
gcgcagggcggtgtcctgcccaacattcagggcgtgcttctgccccagaagaccaagagccaccacaagg 
ccaagggtgaaaaccattcactaggagaggagaaacacaatggccaccaagacagagttgagtcccacag 
caagggagagcaagaacgcacaagatatgcaagtggatgagacactgatccccaggaaaggtccaagttt 
atgttctgctcgctatggaatagccctcgtcttacatttctgcaatttcacaacgatagcacaaaatgtc 
atcatgaacatcaccatggtagccatggtcaacagcacaagccctcaatcccagctcaatgattcctctg 
aggtgctgcctgttgactcatttggtggcctaagtaaagccccaaagagtcttcctgcaaagtcctcaat 
acttgggggtcagtttgcaatttgggaaaagtggggccctccacaagaacgaagcagactctgcagcatt 
gctttatcaggaatgttactgggatgctttactgccatcctcataggtggcttcattagtgaaacccttg 
ggtggccctttgtcttctatatctttggaggtgttggctgtgtctgctgccttctctggtttgttgtgat 
ttatgatgaccccttttcctatccatggataagcacctcagaaaaagaatacatcatatcctccttgaaa 
caacaggtcgggtcttctaagcagcctcttcccatcaaagctatgctcagatctctacccatttggrcca 
tatgtttaggctgtttcagccatcaatggttagttagcacaatggttgtatacataccaacttacatcag 
ctctgtgtaccatgttaacatcagagacaatggacttctatctgcccttccttttattgttgcctgggtc 
ataggcatggtgggaggctatctggcagatttccttctaaccaaaaagtttagactcatcactgtgagga 
aaattgccacaattttaggaagtctcccctcttcagcactcattgtgtctctgccttacctcaattccgg 
ctatatcacagcaactgccttgctgacgctctcttgcggattaagcacattgtgtcagtcagggatttat 
atcaatgtcttagatattgctccaaggtattccagttttctcatgggagcatcaagaggattttcgagca 
tagcacctgtcattgtacccactgtcagcggatttcttcttagtcaggaccctgagtttgggtggaggaa 
tgtcttcttcttgctgtttgccgttaacctgttaggactactcttctacctcatatttggagaagcagat 
gtccaagaatgggctaaagagagaaaactcactcgtttatgaagttatcccaccttggatggaaaagtca 
ttaggcaccgtattgcataaaatagaaggcttccgtgatgaaaataccagtgaaaagatttttttttcct 
gtggctcttttcaattatgagatcagttcattattttattcagacttttttttgagagaaatgtaagatg 
aataaaaattcaaataaaatgataactaagaaaaaaaaaaaaaaa 
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ACAGCAGAAG 
AAGGAGTATG 
TACTTTGGAT 
GAGTAAATGG 
GAGCACGTTT 
ATGGAGATGT 
TGGAGATGTA 
GAAGCCATTA 
ATTTAGAGGA 
TTGAAGTGTA 
AATGGATATG 
CCAGAAACTG 
AAATAAATGT 
GACTAAGTTT 
GTGGCTTTGG 
AGCCAATTGT 
CTCAGAAAGG 
CGATCATGGA 
GGCTCACGCC 
GTTTGAGACC 
GCTGGGCATG 
CGCTTAAACC 
TGGGATACAG 
GGACATTAAA 
CTGTAGAAAT 
GTTTACTGGA 
CAAAGTATTG 
AGAGCCGAAA 
TAGATTTCTC 
ACAATGTTTT 
CAGGGTCTGG 
CTTTGCCTTC 
GTATGCACCA 
ATGTTGCCTA 
AAAGTGTTGG 
AGATAATGGG 
TGGGATTATT 
GGAACAAAGG 
GCTGTCAATG 
AAGATCACTA 
GTAGGGCAAT 
TCTTGCCTTA 
AGCGCTCCTG 
CTATGTTGGC 
CCAAAGTGCT 
AGAGTACAGA 
TCAAAGATGC 
CCACCAAACT 
CAGAGGTGAC 



CACACACACA 
CTGATATTTG 
TAGCCTTGTC 
ATCAAACAGG 
TAGCTGCTTG 
AGGTGAACAG 
GAGAATGAAT 
TTGTAGAGAT 
TAGTGTCAGT 
ATTCCTGTGT 
CAAGTGAAAA 
AAATTGGAAG 
ATGAGACAGA 
TAGTATAGTC 
AATTTAGAGC 
GTTATCTGCC 
ACTATGCTGG 
CTATCGTTTA 
TGGTACCCAG 
AAATTGGTGA 
GAAGGGACTA 
AAGAGAGCTG 
TAGAATATTA 
TGTAATCTCA 
AGCCTGGCCA 
GTGATGTGCT 
CGGGGGGTGG 
AGCAGGACTC 
GTCAACTCTT 
CACTGTTCTT 
AAGAACTTAT 
AAGGTGTGAT 
TAAAGAAGGA 
AATCTATATT 
CTTTTTCTTT 
CTATGTCATC 
AGGCTCAAGC 
CCACACCCTG 
GGCTGGTCTC 
GATTACAGGC 
TGTGACCCAA 
CCAGCAGAGA 
AAGGTTGTCG 
TGTACTATTC 
GGGCTGACTC 
GGTGTGATCT 
GACTCCCAAG 
CCACCACTTG 
CAGGCTAGTT 
GGGATTACAG 
TGGGATAGGG 
CCTGCAGAAC 
GAAAGACCGA 
ACTGAQACAC 



CACACACACA 
TTATTTCATA 
TGAAACAGAG 
CATTTCAGAG 
ATGTGAAAAG 
AGGCCAGAGA 
TATTGCATGT 
GAAGGAAATG 
TATTGAACTG 
CTTGGAAGTG 
TGTGGCTACA 
TTTACTGCAT 
ACAAAGACTA 
TGGTCATTTT 
CCTACACTTT 
TGGTGTCTGT 
CAACTTGGAT 
AGCCACCAGT 
GCGTGGGATG 
TGGGTAAAGG 
TTGAAAGAAA 
GACAGAAAGC 
AATATGCTGG 
GCACTTTGGG 
ATATGGCGAA 
TCTGTGGTCC 
AGGTTGCAGT 
CACTCCCCCC 
GTGAGGTCTC 
GTTACAATGT 
AAGCAGTAAA 
TTAGGTCCTC 
ATTTTTAAGC 
GTAAAAATTG 
TTTTTTCTTG 
CAGGCTGGAG 
AATCCTCCCA 
GCTAATTTTT 
TAACTCCTGA 
GTGAAACACT 
GGATTTAATC 
CACTGCCAAT 
ACTTTTTGAA 
TTTAAGAAAA 
TTTTGTTTTT 
CAGCTCACTG 
TAGCTGQGAT 
CCCAGCTAAT 
TGGAACTCCT 
GCAGGAGCCG 
TGGGGGTGGG 
TGTGTGGGAG 
GACTTCAGGC 
CACTGGGCCT 



CACACAAATG 
TTCTCAGATT 
CTGGGACCTG 
ATTGAGGCCA 
AGACCAGCGT 
TGGTCACTGA 
ATTGAATATG 
TAGCAAGTGA 
GGGAGAACTG 
TTTAGGGTGA 
CACATTTGCA 
ATAGATAGTC 
GGGACCAGAG 
GAGGTGAATA 
TAGCTCTGAC 
GAAATAATTT 
CTTAGATTTC 
CTGTAGTATT 
CTGCAACAAC 
CTGGAAGAGT 
TATGGACATT 
TTCCATTTTC 
TTAAAATATG 
AGGCTGAGGG 
ACCCTGTCTC 
CAGCTACTCG 
GACCCAAGAT 
GCCACACACA 
AGATGAAAAT 
GTCAAGAACT 
ACTGGATATT 
CTTACTGCTT 
AAAACACAAT 
AGAAAGTTTT 
GTTTTATTTT 
TGCAGTGGCA 
CCTCAGCCTC 
TGTTGTTGTT 
GCTCAAGTGA 
GAGCCTAGCC 
AGCCATCTCA 
TTAAACTAAC 
TTCTATAGAA 
GGAAAGACTG 
TCTTGAGGCA 
CAATCTCCAC 
TACAGGCTCT 
TTTTGTATTT 
GACCTCCAGT 
CCAGGGCTGC 
AACATGTAGT 
TCTCTCACAG 
AGGGCAGATG 
GGAAATCAGG 



AGGTATATAA 
TTTAATCCAT 
ATGAGTGAAA 
AGAAGTTAAA 
GGCTGGAACA 
GTGGGCCCTT 
TAGGTGACGT 
CACTCTTAGA 
GAAGGGATAA 
AAGACCTATT 
TTTCAGAAAA 
TTTGGAACCG 
CCAAGCTCCA 
CTTAATAACA 
TATTAACGAA 
AAGCCAGGAA 
CAGCCTGCAG 
TTGTTATGGC 
AAATACCTAA 
TTGAGGTTCA 
AAAGGCAATT 
ATAGAAACTT 
GACTTTAGGC 
CACAGATCAC 
TACTAAAAAT 
GGAGGCTGAG 
CACACCACTG 
CACAAAAAAT 
GAGGGACAGG 
TGGCTGAATT 
TACCAGAAGA 
AAAGTGAAAT 
CAGAACTTGG 
TCTTGAAGAG 
TATTTTTATG 
CAATCTCAGT 
CTAAGTAGCT 
TATAGAGATG 
TCTGCCCTCC 
TGAACAACCA 
GCAGAAGCCA 
GTAGGCAGAG 
CAGGATCATA 
ACCCACCAAA 
GTCTCACTGT 
CTCCCAGGTT 
AAATCTGTAC 
TTAGTAGAGA 
GATCCATTCT 
CACTTTGATG 
CAAGGCTGAC 
ATGGCTGCCT 
GAGTAGGCCA 
GCATCAAGCC 



AGGGTCTCCT 
TTAGGTAGGT 
ATGAGCTCAC 
TGTCTTAAAT 
GCAAAGGAGA 
AAGTCATGGT 
GACTCACAGA 
ATGTTGATTT 
CAGGCTTAAG 
AGAGTTCTAA 
AAGGTCAGGC 
TAGTATTGAT 
AGTTTCTAAA 
GAACAATTTG 
TACAGGAAAG 
GAGATCCTCA 
AATTGTTAGA 
AGTCCAAGCT 
ACATGGGGAA 
TACTAGAAAA 
CTGGCAAAGG 
AGATTTATAA 
CAGGCGTGGT 
GAGGTCGGGA 
ACAAAAATTA 
GCTGAAGAAT 
CACTCCAGCC 
ATATATATAT 
TTATTGGAAA 
ACGCTGTAGT 
GATGTCTAAG 
GTGAGAGGAA 
AGATTTGGGA 
GTATGGTTGA 
TTTTTTGAGA 
TCAGTGCAAC 
GGGACTACAT 
GGGTTTTGAC 
TCAGTCTCCC 
TTTGATAAAG 
GGAAGAGAGA 
AAAACAGAAA 
GAGCTACCTG 
GGCAACTTAC 
CACCCAGGCT 
CAAGGGATTC 
CCTCCCGAGT 
TGGGGTTTCA 
CATTGGCCTC 
TCAGACTCAG 
TCTACCTGTT 
GGGTGGGACC 
ACTACAGAGC 
AAAGAGGGTT 
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3241 TTTCTTAAGA CCTAACAGAA TTTGCCTTGC CAGGTTTTGG ACTTGATTAG GACACATTAC 

33 01 ACCTTCCTTC TTTCCTATTT CTCCATTTTC TAATGGGAAT GTCTATTATG CCTGTTTCAC 
3361 CATTGTACCT TAGAAGCATG TAACATTTCT GGTTTCACAC GTTCAAAGCT GGAAAGGAAT 
3421 TTTGTCTCTG GATGAATCAC ACATTGAGCC TCACCCGTAA CCTGATTTAG ATGATTTTTT 

34 81 AGATGACACT TTGAACTTTA GAATTGATGC TAGAATGAGT TAAGACTTTC AGGGGGCTGT 
3541 TGGGATGGAA TAATTTTTTT TTTTTTTTTG AGACGGAGTC TAGCTCTGTC GCCCAGGCTG 
3601 GAGTGCAGTG GCACCATCTT GGCTCACTGC AAGCTCTGCC TCCCGGGTTT ATGCCATTCT 
3661 CATGTCTCAG CCTCCAGAGT AGCTGGGACT ACAGGCGCCC GCCACCACGC CTGGCTAATT 
3721 TTTTTTTTAT TTTAGTAGAG ATGGGGTTTC ACCGTGTTAG CCAGAATGGT CTCGATCTCT 
3 7 81 TGACCTTCTG ATCCGCCTGC CTTGGCTTCC CAAAGTGCTG GGATTACACG TGTGAGCCAC 
3841 CATGCCCGGC TGGGATGGAA TAAATTTATC TTGTATGGGA GAAGGACATA CATTTTGGCA 
3901 GGTCAAGGAC AGAATGTTAT GGACTAAACT GTGTCCCCCA AAATTCATTT ATTAAAACCC 

3 961 TAAACCCCAG TGTGACTGCA TTTGGACATA GAGCCTTTAG GGGGTACATA AAACTAAAGA 
4021 TCACAGGATA GGGCCCTAAT CCCATTGGGG CTGGTGTCCT TACAGAAGAT GAGACACTTA 

4 081 GAGCTCTCTC TCCACGCAGG CACCAAGGAA ACACCATACA AACACACAGT GAGATGGCAG 
4141 CCATCTGTTA GCCAGGAACA GATTCTCACC ATAAACTATG TTGGCACCTT GATCTTAAAC 
4201 TTCCAGGCTC CAAAACTGTG AGAAAATGAA TTTCTGTTCC AAGCCTCTTA GATATGGAAA 
4261 AAAAGATTCT GTTGTTTAAG CCATCCAGTC TCTGGTATTT TGTTATGGCA GCCTGAGTAG 
4321 GCTAAGACAA TGAAGGATGT GGTAAAACTT TACGTCCCAA CCACATACCA AAGAGGCTGG 

43 81 AATTTAGCAT GCTTTCTTCT TTCAACTGTA GGCAATGTGC ACAAGTTCTA AATCCTAAGA 

44 41 CATGTTGGCT CCTTTACTCT GCCCAAACTA CAACTCAAAC AAACAACTGT AATATAATAA 
4501 CATCCAATGA AGTTCTGACA TTTCTTCAAC ATGAGTACAG TAATTCAATG CCAGAGAATT 
4561 CATTTTATTT TGAAATCTAC ATGCCATATT CCAATTTCTG TTGAAGATGC AATGGTTATA 
4621 TTTATTCTTT TTAATATAGA TTTATCAGAC TGGGCGCGGT GGCTCATACC TGTAATCCTA 
4681 GCATTTGAGA GGCTGAGGTG GGCATATCAC CTGAGGTCAG GAGTTTGAGA CCAGGCTGGC 
4741 CAACATGGTG AAACCCTGTC TCTACTATAA ATATAAAAAT TAGCTGGGTG TGGTGGTGCA 
4801 TGCCTGTAGT CCCAGTTACT AGGGAGGCTG AGGTAGAATT GCTTGAACCT GGGAGCAGGA 
48 61 GGTTGCAATG AGTGGAAATC GCACCAGTAC ACTCCAGCCT GGATGACAGA GCAAAATAAT 
4921 AAATAAATAC ATAAAATAGA TTTATCAGTT TATCAATAAT ATAGTTTTCT TTTCTAGGTG 
4981 TAAATATAGG TAATGACTGT CCTTTAGTAC ATTTTCTCAT GATGCTCCTC TTACTTGGTT 
5041 TGGTACAATA TTAAGTATTG AAATAAAATA GAGAATCCTG TCGCTACACA TGAGCACTTA 
5101 TTCCATTTGC TCATCTCCAA TATGCACGGG AAATTCTCAA ATTGCTAATA ATCTTGTAAC 
5161 ACACATGCAT TATATTCAAC AGGAATATAT AAATTTATAA TTATAATTTA GGATCAACAG 
5221 ATGACAAACC TTTAGAAGGT TTGTATTTAA CCTTAAAATA TAATTTTTTA AAAATTGGTT 
5281 ATAAAATTTC TAATACTTTC TTTTTTGTGA CCTCAAGGGG AAAATATAAT TCTTATAAAA 
5341 GTTCAAATGA TTTACAGAAT ACAAAAAGTG AATAGAGATG ATGAATGAAT TAAAGGAAAG 
5401 GATATTGCTA CATAGATTTG GAAATTTAAA AAGGGAAATT ACGATTGTTG ATTTTGTGTT 
5461 AAACTGATCT GCTTTGTTCA AGATACCTTA TGTACCAAAA AATGATTTTA TCTCAGCCTC 
5521 ATATCTCAGT AAATTCCTGA GACAAACTTT AGTCCCTGGT GCCCAGGTGC CTTTGGTAAT 
5581 TGGGAGACCT CTAGGTTTAG CATCCTCATC CACTCGCCCC AATTTAAATA GTCCTCCCCA 
5641 GGGCCATTCA GGCAAGGGAG ATGAAAACTT GCTCAAGAGT TGGAATCCAA CTGAAGCTAC 
5701 CGAAATTCAT TGCTCAATAG ATAATTTTCC CTGGAAGTAA CTAGGGCTTT TGAATATAAT 
5761 AGTGGGCATT TCAAAGTAGA AGGTAAAGTA TTTTGGAGAT GAGGAGACAG GACAGAGCTA 
5821 CGAGGAATGT CCTTTGCTTA GGGACTAGGC TCTTAGCAGT ACCTCTTAGG TAAGAACTGG 
5881 TTAACTGGCA CCTTCTGTGT TTCTCTGAAG CTCCCTTTGC TTAGGGACTA GGCTCTTAGC 
5941 AGTACCTCTT AGGTAAGAAC TGGTTAACTG ACACCTTCTA TGTGTCTGAA GCTCCCAGAA 

GTGAAATTTG GATTTTTGGA ATATAGTTTC TTTTTTCTTG TTACTTTTTG 

6061 TTTTGTTGTT TTTTTTTGAG AGTCTCACTC TCACTGCAAC CTCCCCCTCC TATATTCAAG 

6121 TGATTCT CTT GCCTCAGCCT CCCGAGTAGC TGGGACTACA GGCGTGCACT AGCATGCCCA 

6181 GCTAATTTTT GTATTTTTTA GTAGAGATGG GGTTGGTTTT TTTTTGAGAC GGAGTTTCAC 

6241 TTTGTCGCCC AGGCTGGAGT GCAGTGGCAC GATCTTGGCT CACTACAACC TCCACCTCCC 

6301 GGGGTTCAAG TGATTCTTCT GCCTCAGTCT CCTGAGTAGC TGGGACTACA GGCGCCTACA 

6361 GGTGAACACC GCCACACCTG ACTAATTTGT GTAGTTTTAT TAGAGATGGG GTTTCGCCAT 

6421 GTTGGCCAGG CTGGTCTCAA ACTCCTGACC TCAGGTGATC TACCCACCTC AGCCTCCCCA 
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AGTGCTGGGA 
AGAGAAAACA 
AATAAACCAA 
TTCAACATTT 
TTATCAGTAT 
TAGGTTAACA 
TGTTATTGGA 
TCACCACAAT 
TGTCCCTTTG 
GTTTACAGTA 
TACCTTTCCT 
ATGTGGACAA 
AATCTCTTAA 
CTTACAACTT 
GATAAAACCT 
ATGTGCCTGG 
TTTCAGAATC 
AAGGAAGAGA 
AGCTGGAATC 
TTCTAACCTT 
TCTTCTCAGC 
CTGCCCATTA 
CAAAGGTGAT 
TTTCCTTTGA 
CTGGATCCGC 
AGCTTCTGCT 
CAATGCAAAA 
GCCAGTCTGA 
GCCCACTTAA 
CCAGCACTTT 
CGGCTAACCT 
GGGAGGCTGA 
GAGATCGCGC 
AAGCAAACAA 
CAGAATCCAG 
TGGGTGGGGC 
CACTGTTGCC 
CAGGCTCAAG 
CACCACGCCC 
TGTTGGCCAG 
AGTGGTAGGA 
TCTATTTCCT 
GTAAGTTGCA 
TTCAACTCCC 
TAACTCTCAC 
CGGGGGACTA 
TACATTTTTA 
TGTTAGGCAC 
TGATAGTACG 
CCCCAAAAAC 
TTAGTGCCTT 
GGGGGGCTCT 
AAATCACTTG 
GATGTTAGGA 



TTACAGATGT 
CTATTAGCAA 
CTCTCTACAA 
TCTCAATGCC 
TTGAATAAGA 
AACTTAACAC 
GCCCAGAGAG 
AAGTCAGTTG 
TTTTATTTGC 
TTAATACATT 
CCTTCCCTTA 
AGTTTACCCA 
GGAGGTGTGG 
AAGTTCTTTA 
ATCTCTTAGA 
CACACAGTAG 
TACACTTGCT 
TGGAGGTAGG 
AAAGGCAATT 
AGGATCGAAA 
CCAAGAGCCA 
GAATCGTTGT 
CATTTGCTTT 
GAGTAGTTGT 
CCTGAGCCGG 
AGGATTATTA 
CGCTTCAGTG 
GCAGCTGGGC 
TTCCGATCAA 
GGTAGGCAGA 
GGTGAAACTC 
GGCCGGAGAG 
CACTGCATTC 
ACAAAAAAAT 
GAAAATAGGT 
AGCTGTTACC 
CAGGCTGGAG 
CGACTCTCCT 
AACTTATTTT 
GTTAGTGTCG 
TTAGAGGGGT 
TTTCTGCCTG 
TGTCAGGCAC 
TGGTTAACTT 
AGAATTAGGA 
GTCGGAGGAC 
AAGTAATCAC 
TAACTATGGT 
TAACTGACCT 
CGAAAAGCAG 
TTTTCCTTCT 
GAAAAGAGCC 
CCCTTGGCCT 
AGGACGCCGC 



GAGACACCAG 
CCTATTAGTC 
CAAAGTGCTT 
CAACAGCCAA 
GGGGGTCTAC 
AATGTATCAT 
AAGAATTGAA 
CACCAAGTCT 
CACACCCTAA 
GTCAAGATTT 
AATTCTTCAG 
TTATGTATGG 
TTATAGAATA 
AGCTGTTTCT 
TTGTTGGATT 
TGCCTAATAA 
GAGCCAGGTT 
AAGAGATTAA 
TGGTCAGTGA 
TTCTCGGACA 
TGTGAAACCA 
AATTTAAAAA 
TATGCCACTT 
AGGGAAAGGA 
TGTCAGTATC 
TCTCCTGCCA 
GAGTTCCAGA 
GCAGATGCAT 
AGCAGAAACC 
GGCTGGCGGA 
CGTTTCTACT 
TCGTCTGAAC 
CAGCTTGGGC 
GCAGAAACCG 
CTCTAGAAAT 
AGATCCCTAG 
GGCAGTGGCA 
GCGTCAGCTT 
TTTATTTATT 
AAGTCGTGAC 
GAGCAGAAAG 
TAATGGCAAC 
CGTTCTACAT 
TTAGGTAATA 
AAGTGAGGCT 
CAAACAAGGT 
AACGAAGTGT 
CGATCTTACA 
ACTATTACAT 
TAATACGCTT 
ACCTACAAGC 
TTTGGGTTTG 
TGTGGTGACT 
CCTGAGCAAT 



ATCAGCCTCA 
TAATATTTAA 
CCTGGCTGCC 
GTGTCTCTTG 
ATCTTAAGTA 
TCACTACTAA 
ATTCAAGTTT 
TGTAGCTCTT 
ATAAAAATTG 
ACCTCTTCGT 
AGGTTAGAAA 
ATGTTTTACT 
GTCAGCTGTT 
TAGTTTGCTC 
AAATGAATTA 
ACCATCTCTC 
CTTTTCATTT 
GCCCTAGGCC 
ATAAAAAGGA 
TACAGGAAAT 
GACCTTCAAA 
TACCCTCGGA 
TGTTTTCACC 
GGGGGTGGAG 
TGGGAAGTGG 
CACACTCGGA 
AGCGTTAGAC 
AGGCAAGACT 
GGCCGGGCGC 
TCACCTGAGG 
GGTGGCGGGC 
CCGGGAGGCG 
AACAGGAGCA 
AGATCCGGAA 
TTGTCCATGG 
AAGCAAAGGT 
CGATCTCGGC 
CAAGAGTAGC 
ATTTTTATTT 
CTCAGGTGAT 
CAAAGGTTTT 
CTAGACGCTT 
TAGGGACATT 
TACTCTGCAC 
GCCTACAGCC 
TACCAACACG 
TTAGATCACG 
AAGCATTAAC 
ACAAACAGAC 
TGCTCAAGGT 
AGTGAGGTTA 
ATAGCGTTTC 
CTCGGTCTTC 
GGTCACCCGG 



GAAGACATTT 
TACTTAATGT 
TAAGTCATTG 
TATGCCAAGT 
CTGCTTAAGA 
ATAGACCGAA 
TCTCTCTCTC 
TACTGAGCCA 
TACTGGCTTT 
GTAGATTCCC 
GCCATTAGTA 
CTTTCTATTT 
ATAAGTACTG 
ATCTCAAAAT 
ACATACTGGA 
TTATTCAGCC 
CAAGGTGAGC 
AAGGTCACAC 
TTCCAAGGCC 
GCTGGGGGGG 
TCTGATGATT 
AAATTCTAAT 
CAAATGGGAC 
GGAGGGAAGA 
GAGGCGCGTC 
TTTGAAGGCT 
TAAACGACTG 
TAGCCCGCCT 
GGTGGCTCAC 
TCAGGAGTTC 
GCTTGTAATC 
GAGTTTGTAT 
AAACTCCGTT 
GAAAACCTCG 
TCCCAGATCT 
TTTTTTGGGG 
TTACTACAAC 
TGGGATTACA 
AGTAGAGAGG 
CAGCCCCCTC 
TGAGTGGCCA 
GAGCTTCTTA 
AGTCTGTTTT 
TTTAGCAGGA 
TAAATTGAGA 
TTAGAGTTTT 
AGGCATCCCT 
TAGAATATTT 
CAACCTTTAG 
TGGCATAAAA 
GCTCTTCCTT 
CGGGAGCTCA 
TTAGGCAGAA 
CCTAGCAGTT 



TCTATTGGAA 
CTTCCTTAGT 
ATTCATTCAG 
TCTATGCTGA 
TGAAAGCCTC 
TACAAAATCT 
CTTTTCTCAC 
TGTTTTCACG 
TTTTCCCTGG 
TGGGGAAAAT 
ACATTCTGGT 
TTCTGACAAT 
TTTTCCTGGC 
TCGGAATAAG 
AGCTCATGAA 
TGTTTTCTGA 
AAAAGCATAC 
ACCGATTGGG 
CATAAGGCAA 
GAAAATCCGG 
CTCAGCCCAG 
ATGTGGCTAT 
ATCCAACCCT 
GCGGAAAAGG 
AGCAGTAAAC 
CCAAACGAAA 
GGTCTGTTTG 
AGACTTTTCT 
GCCTGTAATC 
GAGACCAGCC 
CCATCTACTA 
GCAGTGAGCC 
TCAAAAAAGC 
GCGAGATTCA 
CCATTTCTTG 
GACCGTGTCT 
CTCCGCCTCC 
AGGTATGTGC 
TGTTTCACCA 
GGCCTCCCAA 
CAGGCCCCAC 
AAATACAAGA 
ACAGACACCT 
ATGGGACCTA 
AAAAAATAGA 
GCCTTCAATT 
GCATGTAAAC 
CTTTAGAGTA 
TAACAGCGCT 
TTAACTTACC 
TGAAACGGTA 
GATACCTGTC 
GCACGGCCTG 
TGTTGAGCTC 
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9721 CTCGTCGTTG CGGATGGCCA GCTGCAAGTG GCGCGGGATG ATGCGAGTCT TCTTGTTGTC 

9781 GCGAGCCGCG TTGCCGGCCA GCTCCAGGAT CTCGGCGGTC AGGTACTCTA ACACCGCCGC 

9841 CAGGTACACC GGCGCGCCTG CCCCAACCCG CTCTGCGTAG TTGCCTTTAC GGAGCAGGCG 

9901 GTGCACTCGG CCCACCGGGA ACTGGAGACC AGCGCGAGAA GAGCGGGATT TCGCTTTGGC 

9961 GCGAGCTTTG CCTCCTTGCT TACCACGTCC AGACATTGCA ATCAGACAAA AATCACCAAA 

10021 ACCAGCGGCC TAAGCTCACG AGAAAACAAA CAAAATCAAG AAATATGTAA AACATGGCCG 

10081 CTTTTATAGG TAGTTCCTGG GGAGTAAATC CGACTTTTTG ATTGGTCGGT AGCAAATGCT 

10141 AGTCAGATAG CCAATAGAAA AGCTGTACTT TCATACCTCA TTTGCATAGC TCTGCCCACG 

10201 GATGACAACT GTGCAGTTTG TCTTCCAATT AACTAAGAGG TACTCTCCAT CCCTCATTAG 

10261 CATAAAAGCC CTATAAGTAG CAGAAATCCG CTCTTTACTT TCGACACATT TCTGGTGTTT 

10321 TAAGATGCCT GAGCCAGCCA AGTCTGCTCC CGCCCCGAAG AAGGGCTCCA AGAAGGCAGT 

10381 GACCAAAGCG CAGAAGAAAG ATGGCAAGAA GCGCAAGCGC AGCCGCAAGG AGAGTTACTC 

10441 TGTGTACGTG TACAAGGTGC TGAAACAGGT CCATCCCGAC ACTGGCATCT CTTCCAAGGC 

10501 CATGGGCATC ATGAATTCTT TCGTTAACGA CATATTTGAG CGCATCGCGG GCGAGGCTTC 

10561 CCGCCTGGCG CATTACAACA AGCGCTCGAC CATCACCTCC AGGGAGATCC AGACGGCCGT 

10621 GCGCCTGCTG CTTCCCGGAG AGCTGGCCAA GCACGCCGTG TCGGAGGGCA CCAAGGCCGT 

10681 CACCA AGTAC ACCAGCTCCA AGTAAACATT CCAAGTAAGC GTCTTAACAC CTAACCCCAA 

10741 AGGCTCTTTT AAGAGCCACC CAGATACCCA CTAAAAGAGC TGTGGCCAGA CGCCAAATTT 

10801 TATTTGGCGG CGGAGGGGTA TTAGAATATA GGAACTGGAG AGGGGTGGGG ACAAGTGTTG 

10861 CAGCTTAGAG AGGGACAAAG GGTCCTGAAC CCGAAAGAAG CCAGCCATTA AAAATGGCTT 

10921 TGGGGTCAAT TCGTTGTGCT TAAATTTAAA ATGGAGACAA GCGGCCATTT TGCTAACTCG 

10981 GCGTTCCCGG AAGAAACCGC AGGCTCGCTT AGGTTTCAGA CCCAGCTGTC TGTCCCTGTC 

11041 TACGTCGCCA GGATCAACGG TTGCCGTAAT GTCATAATTT CGCCACCAGC TTCTAGCCAA 

11101 TAGGCTGTCC TGTCATTTTA AATATTAACC AATCGAGGGA AAGCTGTTTT GAGACTCTGA 

11161 TTTACATAGC GGACCGGAGT GGGAACCTGG GCAGTAACTG CCTAAGGAAG GACTCCCCCT 

11221 CTGTTTTCGT GGCGCACACC TTCGTAGTAT ACTGAAGGGT GTGTCTCCTG GGTTTCCAAC 

11281 TGCCCCGGTA ATAGTCTTTT AACCTAATAT GCGTCAGTTT TGATAACAAC ACTAAGGCAG 

11341 TACAGAACTA AAGATGTAAG CACTGCGCCA GATGTTGCTT CATACATCTT ATTCTATTCA 

11401 ACTGGTTTAT TCAAGATTCA AATCAAATCA AATTTTGCTT GAATCCCAGT GCTCAGTCAG 

11461 CCATAAATGG TGTGTTGCCT GATTGAAACT TAAAATCTCC GTAGGGGGCT TGTAACATGC 

11521 AGACAAGTTT GAAAGTTGCT TTAGGAGAAG CCAACTCTTA ACTGCTGGGT AAATTGACAA 

11581 GCCTTCGAAC ACTGAACTGA AGGCCAGTAA GGACTAGGCG CTGGGTGGGG GAGAATGAAG 

11641 AGGAGACGTC ATTAAACTTA GCACATACAC TGTATCTCCT AGAGGACTCT CCCTTCCTAG 

11701 ACAACTGCAG GCCGCTTTGT GGCCTGGGAA ATTCCACATT CCCTTAAGTA TTTTACTCAT 

11761 GGTCTTTTCC AGGTAAAGAT TTTAAGATGA AGGGTTAGAC GTAGTCTACC TATCTTTTTA 

11821 TTCAAGTCTA GAACACGTTT TTAGCACCTA GAAGTTTGCT TTCTCCATTA AAAACCGGGA 

11881 ATATACAATA AATAAAATTA GTGTTAAAGC AGATTTTTAC AAACTTAAAT ACCATGTAAT 

11941 TTAGGTTACA GTTATTTAAC ATAAGGACTG TGTGATCTTA AATCTGCAAT TTCTTTCACA 

12001 CCTGGGAAAT AAACTAAGGC CTGTCTTTGG TGCCAGACAA GGCCTTATAC TTGAACACTG 

12061 CTGTGCAATC ACAGGCTGCC TTGCCTAGAT AACTTATCTG AGAAATTCTG ATGAGAAATG 

12121 AAATTTCCAG AGTCCCTCAC AAGTAAATTT TTTTTTCTTT TTTTTTTTTT TTTTTGAGAC 

12181 GAAGTTTCTC TCTTGTTTCC CAGGCTGGAG TGCAATGGCG CGATCTTGGC TCACAGCAAC 

12241 CTCCGCCTCC CGGGTTCAAG CCATTCTCCT GCCTCAGCCT CCGGAGTAGC TGGGATTACA 

123 01 GGCATGCGCC ACGACACCCT GGCTAATTTT GTATTTTTAG TAGAGACGAG GTTTCTCCAT 

12361 GTCGGTCAGG CTGGTCTCGA ACTCCGGACA TCAGGTGATC TGCCCGCCTT GGCCTCCCAA 

12421 AGTCCTGGAT TACAGGCTTG AGCCACCGCG CCGGGCCTAA ATGGTTTTTT TTTTTTCTAT 

12481 GCCTCTAATG GACCTGGTCA CTTATTCCCA TTCAGACTGA CCGCTCTCCT ACCTGCCAAC 

12541 TA ACTA ATCA GTGTAACCAA AATCTGCAAA CAAAATTCAG TATTCTTTCC CCGCCTTTTC 

12601 CCCTTTCTCT TACATAGATT ATGTTTTTGC CTGTGTTAGA TGAAATAATT CTATTGCTTG 

12661 TTCTCTCTTC TGTACAAGTA CCCAGTAAGC AAATTATTAA CTTCTTGGTC ATTTATTTCT 

12721 GAATTTTCCA CCAAGACAGT GTTTATGTGA GTCATACAAT AAGAACCAAC AGAAATGTGT 

12781 GTCTTGGAAA CAGGTTGTCT ATCCCTGGAC CCTTTGAGTT TTCTGTTCAC TTTCCTTTGG 

12841 CTTTTGCATG CTAAAAGTTT ATCGTCCGCG TTTGTTTGTT TTGGTTATTC TAATTGGACT 

12901 TGGCTGATTG GTTGCATATT GGTGGCAGTA GTAGAATTTG AATTCTGGTT TTCTGGTCAC 
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12961 ATCATTAAGT GATTAGTCAG TGGAGAGGAC AGGAAATCTG GTTTATTTAT TAACCTTTTT 

13 021 TTGGGGTGTT TTTGTTTGAA GATGTTGATA TTCTCTGTGA GGACACAGGG TTAGAGTTGG 

13 081 TGTTTTTCTT TCTGACTTTA CATGGGATTT GATGTTTTGT GCTTGTATGC CTCTTTCCAC 
13141 CTTCCAAAAC TTGTCTTTTT TGAGTCCAAA TAGTTGTCGA TATCTGCAAA ACCAGTATTC 
13201 CTGTGTTAAG ATGATATGAA TATAAAATGG CTGCCCTGTT ATAACTTTTG ACTTTAAGAA 
13261 AGTGTTAGGA CTAACAGGAG ACAAAAAGGA AATCAAGGAA ACCGAATGTC TGGTCTCAAT 
13321 AACTGCTATG GCAGAGGCTC TACAGCTTAT TATTAATTTT AGTAATTTCA CATTATTGCC 
13381 CCTTCACGTT CTTTAAGTAA GGTTAGAGGA CAGAAGAAAC ATAATGTTGT TACAAATTGG 
13441 ACTATTGAGT CAGGGAAAAA AAAGAGTGCT TTCAATATCT GAATAAAACA AAGATTTAAT 
13501 ATTTTCTAAA CCTTAACGAG TTTATTGTAA GGGATGTGAT GCTGGAAACT AGGAAACTAG 
13561 AATTTTCTTC TAAACTGAGA ATCAGAATTA TTCATATTCT CAGCAGTGGT GCCACCTGAG 
13621 GGACTTCTGA TCTTAATTAC ATACTTTTAT TTCTTTAACT GATCAACATG CTAAATAGAT 
13681 AACCTATGGC TCTGTTTTTA CCCACTTTAA ATTCTCTTCT ATTAGCACGG TTAGCTTTCC 
13741 TAATTGGCAA TAAGATTGAG ACTATCTTTT TTTTTTTTTT GAGACAGAAT TTTGCTCTGT 
13801 GGCCCAGGCT GGGQTGCAGT GGCACAATCT CGGCTCACTG CAACCTCTGC CTCCAGGGTT 
13861 CTAGCAATTT TCCTGCCTCA GCCTCCCCAG TAGCTGGGAT TACAGGTGCA CCACCACGCC 
13921 TGGCTAATTT GTGCATTTTT AGTAGAGATG GGGTTTCGCC ATGTTGGCCA AACTGGTCTC 
13981 GAACTCAGGT GATCCACCTC GGCCTCCCAA AGTGATGAGA TTACAGGCGT GAGCCACCGT 
14041 GCCCAGAAAA GACTATCTTA TTTTATGAAT TTAAATAATT GTGAAATTAT CCACTTAAGG 
14101 GAATTAATAA ATTATAATGT AATCTTAAAT TTTAGTTGGC TTACATAAAG ACTTAAAATA 
14161 CATCAATTTA AATAAAAACT CATTTGTCTA AAAAAAAATC AAAAATTTTC CTTGTGCTTT 
14221 AAATGTGCTA CCTCTTTAAG TTCTAATTAA GAGAAAAAAA GTTTAACTGT GAGTTTCATT 
14281 AGTGGTCTTA GTTAACAGCT TAAAGTATTT TGTAAAAAAA ATACTTCACA ATTTTTAAAT 
14341 AACTTAAAAA TATTAATACC TCTTTTATTA GGTTTTTTTA ATAAGGAAAA TATATAATAC 
14401 ATCTAATCAA GATTTTTTTT GGACAAATTG GCTTAATAAT TTCATTTTAA AAATGGCTTC 
14461 TTTATTCTTA TACTGTAAAA ATAATATTAG CAGAATATTA TAGTATACAC AAGTTTAGGG 

14 521 TTCATATTCT AAAAAACAAA AACAAAAGCT AATTTAACTT GCATTTACTA AATTTCTTCC 
14581 ACTAGTTGTA CTGGTTACAT GAGTTAACAT CACTTTATTT ATTATTCTAA AATTGTAAAT 
1-4641 TATTCATTGA ACCAAATTAA ATGATAATAG ATAATGTCAT TTTTAAAAAT GGAATTAAAT 
14701 TTTATGTTAC TAATTATAAG GATTCAATGT GTGAGCTTAA GTACTGAGTT CACAGTGTAT 
14761 GATAACTTTA AGAATTTAGG TGAATATTAT TAAATTGAGT AAATTAATTC TCAATCTTTG 
14 821 GATACCTGGA CAATTTCTAA ATTGGAGGGT ACAAAATACA AATCACAAGA AACAGTGTAG 
14881 TTTTATGCAA ATAACATTTT TACACAGTTT AGAATAACCA TTGATAAACA GATAAGAGAA 

14 941 CATATGATTG CCTTAGAATA GATACTGTTG CTTTCGCCAC TTTAGATTTG TAAATCACGT 

15 001 ACTGTATACG TGTGGGCGTA GAGGACCATG CAGGTTTTGG ATGACTGCCT CTGTTTTCGT 
15061 CATGCCTATG CGGGAACACA ATTGCCTGCT TTGTTTAAGG GCTATGGTTA ATCCAAACAG 
15121 CTCTGACTCT ATCAAGTACT ATAGCTACAG AGAAACACAA GTAAGCATTC GAGATAATGA 
15181 CTACCTTGAG CCTTTACTTA TTTAAAAAGT TGTTACTGTT TGTTAATGTG GTACATTCAA 
15241 TTTACTATGG ATTGTCACTC TAAAATAAGA CTTCAATCTT TTTCTTATTT TTATATAGCC 
15301 ATGATTTATA TTCATATCTT AATGTAATAA CCAATCTTCT CTGACAACAT TATAACAATG 
15361 CTGGAACCTC CATTTTCAGT ACTTCAAACA ACAAATACTG CTTTTATACT TCAGAGCAGA 
15421 TGGATATGTG CTTCCCAGTG TAAACACATT TGGAATCTCA CTGAGAAATA CACTATCACT 
15481 AAAAATACAG TTCTGAGATT CATTAAAAGA CCTCCAGAAT TCTGGAAGTA GGAAGTTTCC 
15541 TCTTCAAAGT CTACAGAGGA AGATGAGGTC TGAAATAGAC AGCTTCTTCC TTCTTTTACC 
15601 TGTGGTATTA TTCTGTTTTG TCCTTTTCTC CATTATCTGT CTTTCCAGTG ATGAAATTTT 
15661 GATCTGGCCC TCCCAAGTAT TAAAAAACAA GCAAATAAAC AAATCTCAGT TATATTTTAC 
15721 TAAGATATTG GCATGCTAAC TTTTTGCAGG TTTGTAACAA GGACCTTTAT AACTTGACTA 
15781 AAAGTTCCTA AATAAGAATA TTTACTAGAA AATTTATTTC TGCCTGTGGC CCACATTTGA 
15841 GTCAAAATAA TCAATTAGGA AAAATGAACT TGTTTAACTA AAGTTGACCA AACTGATCTT 
15901 TGACCAAACT GATCTTTGAG ACCTATTCAT CTAAGACAAG CCAATTAAAT TCTTGGAGAC 
15961 AATTTGTACT TTAAGGAATT CTTATAATAT TTGTAATTAC CCTCATAACT TTTTTTTTTG 
16021 CCCTACTTCT GTGCTTCTCT AATATGCAGA TTATTAAATG TTGTTACAAA GCCATTGTCA 
16081 AAAAAACAAA AAACAAAAAA CTAAACAAAC TCACATGGTT AGACTTGCTC CTTTATGAGA 
16141 TATTTTTACC AAAAATGGAG GAGTTGAAAA ACTCTGGTGC CAGAAATCGT GAAGACATGG 
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16201 

16261 

16321 

16381 
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16501 
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17161 
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17401 

17461 
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17641 
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17821 

17881 
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18421 

18481 

18541 

18601 

18661 

18721 

18781 

18841 

18901 

18961 

19021 

19081 

19141 

19201 

19261 

19321 

19381 



CCTACCTAAC 
GCTGCACAGC 
ATTGTTTTCr 
TCTACTGAAA 
TCCCATTTCT 
TGCTTGATAT 
CAGTTTGACA 
AGATGGGGTG 
ATGCTAGTAA 
ACAGACTCCA 
CATTTAATCT 
CCTACCTTCT 
ACTGCTGTTT 
GCTTATGACT 
TTATTTGATA 
AAAAAATACA 
ACTTAAAGAA 
TTCAGCACAT 
T^CTTTTCCT 
AGAGTATACC 
GTCTGAAGAT 
AGAGTAAATC 
AGAGACAGGG 
ATCTCATTGT 
ATCTGGAACC 
TAGAGATGGG 
CTCCTGCCTC 
ACTTCAGTTC 
AAGATTCATG 
GGTCTCATGT 
AAGGGTAAAA 
AGAGGAGGCT 
TGACCATATC 
TTTGTTTTCA 
TAGTTTTCTT 
CAAAAAAAAA 
GGGTAATGAA 
AATAAATATA 
CAATACACGT 
GATGCTCAGC 
TCTAGGCTCT 
TGAAAAACAC 
ATGGAACTTT 
GGGCTAGAGG 
AACTTAACCT 
ATCTCATAAG 
ACACTGCCTG 
TCAGAGTCAA 
CTCTTTTCTC 
CTGGACACTG 
AATGGAGGTA 
CTCTTTTTTC 
ACATAAAGAC 
ATATACTCAT 



ATGGAAATGT 
CAATCTTAAG 
TTTAACATTC 
ATAATGTGCA 
ATTTTATAAA 
ACCTTGAGCA 
AACTCAACTA 
AAGATTAAAT 
AATGGCTGCA 
AGTTTGACTC 
CTCTGTGCAT 
AGAAGTATGT 
GACAAATTTT 
GAAGACTTTG 
TGAAAATGCC 
CTTTTTTTCC 
TGATAACTAT 
TGACAGACAA 
ACCTTTAGCC 
ACTGTAACAT 
CAGTTTGACA 
TGGAGAATGA 
TCTCACTTTG 
AACCTCCACC 
ACAGCAGGTG 
GTCTTACTAT 
AGCCTCCCAA 
TGAGGAGGAA 
TAACCTTATC 
TTCTACAGTT 
GAGCAGAAAT 
ACCTGTGGTA 
AAGTTTTCAA 
CTTTTCTCCC 
TTCACTTTTT 
TTGAAAATTA 
CCTTGGACAC 
TTTTTAACAA 
TGTGAGATCT 
AGGCAACAGA 
AAAAATCAGA 
TAAGTCTTTT 
AGGACACTGA 
ATGTGGGTTT 
CTCTGTGCCT 
GTTGTTGGAA 
GCACAGAGCA 
ATACAATATC 
CAGGGGGAGA 
TTTCATCTTG 
TTTTGAACAA 
TATGCATAAA 
AAAATTAAAA 
ATTCATATAT 



TGGTTGTCAG 
TGTTTCTAGA 
TTGGTTTATA 
AACATAACAT 
TCATCTTTTT 
CAAGTAAATA 
CCCTGAGCCT 
GAAATAGCAC 
CAGCACTGCT 
CCAGATCACC 
TAGTATCCTT 
GAAGATTAAA 
ATTTATAACC 
GTAGGAGTTG 
AGTTGATCAT 
CTGAACATAT 
CATTTCTCTT 
TCCCAGTAGT 
TGTGTAATCC 
TTCCTGAAAG 
TATCCTCAAG 
GCCACTTTCT 
TTGCCCAGGC 
TTCTGGGCTG 
CACACCACCA 
GTTGCCCAGG 
ATTGTTGGGA 
AAAATATGTA 
ATCCAATGCG 
GCTCATGCCT 
GATGGGGCTT 
AAACCTTATC 
ATGGTAAAAG 
TCCTCTCCCC 
TGTCTACTAT 
AAATGTGCCC 
TAGATTTTAA 
TTAAAAAATA 
TGAATGGAAG 
GTAAGAGCAT 
CAGTCCCCAC 
TCCTCACTGG 
CTAGGTTACA 
ACTGCACAGG 
TAATTTCCTC 
CAACTAAATG 
AACATCCAGT 
TCATATCTGA 
CAACAGCTTT 
CAAATAAACC 
TCAAAGAAGG 
ACTATTAAAA 
TAACTCCTAG 
ACATATATCT 



TGGAAAATAC 
GAATCACTAA 
CAAGAAGAGA 
CCTATTCCTA 
AAAATACTTT 
GTATGCCAAA 
ATAGAGTGGT 
CTATAGAACA 
CAATGATGAC 
ACATATAAGA 
CTCTATACCT 
GATCCTTAAT 
ATCTTTACGC 
GCCTTCTATA 
AGTATGTTTA 
GAAATTAGCT 
AAATCTTCCA 
CCTAAATTAA 
TGGATGACCA 
GTATTCTAGG 
TATCATGAGT 
TACTACTCCT 
TGCCAGGCTG 
AAGCCATCCT 
TGCCAAGCTA 
CTGGTCTCAA 
TTACTAGTGT 
ATAATAATGG 
CAATTTGTAG 
TGATAGTAGA 
CTCTCATTCT 
CTCATCACTT 
AATTGGATTC 
CCATTCTCCC 
TATTTGCCCA 
CTTTTGTTGT 
AACACACACA 
AAATTGCATG 
GAAAACTGCT 
GTTGGAGGGT 
GGCCTGGCCT 
ATAAATTTTT 
TTCATCTTTT 
CTCATTATCC 
ATCTATAACG 
CATTGGTATC 
GAACTTTAGC 
TAAATTACAG 
TAGACATATC 
AATGAAAATG 
ACAAATGAAC 
TATTCTTCAT 
TATCTCCTAT 
CACATCATGT 



TACACAGAGA 
TTGTTTCTAG 
GTATCCATAC 
GACAGTTTGT 
GTTGAGTGAA 
AATTAAATGT 
AATAATTGCC 
CTAGTTCCAG 
AAAAAGTGAA 
TGTGGGACTC 
TTACAGTGAT 
GCATATAAAC 
TCCTAAAAGG 
AATTATAAGA 
CCGGGGTCCA 
CTCTAGGCAT 
GATTTGGAAG 
AAGACATTAA 
AGCATAAAAT 
CTCTGAGTAA 
TCATTATAAT 
TGACCTCAGT 
GAGTGTAGTG 
CCTGCCTCAG 
ATTTTTTAAA 
ACTCCTGGGC 
GAGTCACTGT 
GACTTTGGTT 
AATAATTAAT 
TCTCCTTGCT 
ATGAGGAAAT 
AAAATTCTAG 
AAGAGAAATA 
TTCCTTTATT 
AACTCAACTG 
TAGACTTGCT 
TTTGAGCTTC 
TTTAAAAAAT 
AGCCTCAAGA 
TTAGAGAGTG 
TCGTCGCTGT 
ATCCTTCAAG 
AAGAGCGTAC 
AACAGCTGTG 
CAGGGAGAAT 
TATTGTGTAA 
CATCATCATT 
AAGTGAATCA 
TTTTCCAACA 
AGTGATCCTA 
ACCTGGCTGA 
AGAAATTTAT 
TCTTTTTATA 
ATCATATATA 



TAGCCATAGT 
AGAATCACTA 
TAAACTCTTT 
AGTTTTTTTC 
ATCAGTCCAT 
CTTTCAGTCA 
CTACTCATAA 
ACGTGGTATC 
GCTTCTGGAG 
TGAGGCAGGT 
GGTAATAGCA 
CACTGTGTTT 
ACTTGAAGCA 
ATTTCATAAA 
ACAGGTTGAG 
ATTCCTAAGG 
GATATATATA 
AAATTAGTGA 
TAAATTGAGT 
TTTCTTTGGG 
TAAGAAAAAG 
TCTTTTTTTC 
GCGCAATCGC 
CATCCTGAGT 
AAGTTTTTTG 
TTAAGTGATC 
ACCCCGCCCC 
TGCTGATTTA 
AGAGACATCT 
GCTGGCTCAG 
AGACCTATGT 
GCTTATTCTC 
TGAATAAACT 
TTCTTGTCCT 
TAGGCTAGAA 
TAAACAATTG 
AGTGCACTGA 
CTGCAGAGAA 
GTGGATCAAA 
TGCTCAGGGT 
ATCTTCTTTA 
TTTAGATCAA 
AGACATTCAA 
CTACCTGGGA 
GACAGTAGGT 
AGTGCTTAAA 
ATCATTGTTC 
ATCACTCTCT 
GTCGTCACTG 
GAAGAAGATA 
GAAAAATTAG 
GACACAGGAA 
TGTATATTAT 
AAATAAATTT 
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AGGTGTCATG 

TATGGATATA 

TGCATTATAT 

TTATTTCCTA 

TCTAAGGTTC 

CCCAAAATAA 

TCACTGTTAA 

ACTCTATTAG 

AAACTGCACT 

AGAATAATAC 

CTCTGAAGGG 

TGTCTCTTAT 

TCCCCATCCC 

ACAGGACTTA 

ATAGGATTGG 

ATTTATTTAA 

AAGGCTATGT 

CAACAGAATC 

AAATTTTGTT 

AAGACAGCAT 

TAAACTTGTC 

CCTCAATCCA 

GCCAAATGTC 

CCTCATCTTG 

AAACAAAAAT 

GCTTAATGAT 

GCACTGCTGA 

ATGCTGTATG 

TGCATCTATT 

ATGGGACAAC 

AGAGGTGCTA 

TACAGAAATT 

GCTAGGCTGT 

CTTCTATCAT 

CATGTCTAGG 

AGGCATGTAC 

TGCTTTTCTC 

TCTAATGTTT 

AAAATTATTT 

GTGTTGTTAG 

TCTTTACTTA 

GTGGAACAGA 

TCTTTCTTCT 

TTCCTCTCTC 

AGCAGAAACC 

CTATACTCTC 

AGCACATAAT 

AATTCTTTAG 

TTTTTCTGTG 

ACAACTTTTA 

TACTCAGTAG 

TTTGAACAGA 

ATAAAQTTCA 

TAAAAATGTT 



ATATATATTT 
TTGATAATTA 
TATAGATTAT 
AGGATAGACT 
TTAACATATA 
TATGGAAATT 
CCTAATAGTC 
TGAGGGTCAT 
GCAAAATGAT 
CTTAGGTTAA 
ATACTAAACT 
AAACATTATA 
CCCAAATTCA 
TAAGAGGTAA 
TGGCCTTATA 
AAGAAAAAAA 
GAGCTCTCAC 
CAGCCATGCT 
GTTTAAACCA 
CATTGCTGTC 
CAAGGTCACA 
AGGCCAGGAC 
CACACCCCAG 
AATAAATATG 
GCAAAGTATG 
ATCCTTATAG 
TAGACTGTAA 
TTTACTTTTT 
TCTTCAATGG 
TGCACATGAC 
CCCACTAAAC 
CACTTACAGT 
TTTGTTGGGG 
CCTGTGTTAA 
GGTCATATCT 
CATTTTAATG 
TGACTATTCT 
TTCTCTCCTT 
CAGTCATCCA 
CATTATACAT 
CATATAAGTA 
AATAAGATTA 
GGGTCAGGTA 
CTCTTAGACA 
TACTGAACAA 
TCAGTGATTT 
AATTGTTGTC 
TTTAGAGACC 
TCTCTCAGCC 
CAAAACATGG 
TTGAACCAAA 
GTTAAAGTTA 
AAAATAGTGC 
AGATATCAGG 



AGATAAATAT 
TGTATTTGTT 
ATAGCTCACA 
TCATGAAGTG 
CATTGCCAAA 
CCTGTTTTAT 
CTTCAAAAGA 
TCTTCCCATG 
AAACATGACA 
GGCCACATAA 
GCATTTAGCT 
ACTACTCTTT 
TATATTGAAG 
TTAA6GTTAA 
AGAAGAGGAA 
AAAAAGAGGA 
AGTGAGAAGG 
ATACCCTGCT 
CACAATCTAT 
ACTTACAGAC 
AAAGCCAGAA 
TCCTCCACTC 
AGTCAGCATT 
ATCTAACAAC 
TAGAAAACTA 
TCTTGGAGGG 
ATTGGTCCTA 
TTATGGAAAC 
GTATGCACAG 
AGTCAAAAAT 
TAATATTTGT 
GGGTTACCAG 
GCTGGCAGGA 
CCATCTTCCA 
ATGTTCCATG 
CACACCTTGG 
GTATTCTGGA 
GCTTTCAAAA 
GTAATGAGCT 
GTTAAGCATT 
CTTATATACT 
CCTAGATGTT 
CTCCCAGAAC 
TTTTCCAGGA 
ATTATTCAGG 
CCCTGCCTTG 
ATTGCTTATG 
AAGTAATACT 
CTGTAATAGC 
AATTATCTAC 
AAAAGCAGTT 
ATCGTAAAAT 
TTGAAAAAGG 
AAAAGCCAAG 



ACTTAGAAAC 

ATTGACTACT 

CATCTTTGTA 

GAAATACTAA 

TTGCTATTCA 

AGCACTCATA 

AAAAAAAATT 

TTTCTTGTTA 

7CAATCATTA 

ATATTTATCA 

GCATGCAACT 

GAGAAAGTGT 

CCATAAACCC 

ATGAGGTCAT 

GATTCTGCAC 

AGAGAGGGAG 

TAGCACTCTA 

CTGAGACTTC 

GGTATTTTTT 

AAGAAAACTA 

ACAAGTGAGG 

CACATGTAGA 

AGACCAAGAT 

TTACCCATGT 

TGTTTACCAC 

GTTTGTATAT 

GAGAGAAAAA 

ATATGATATA 

TTGAGCTGTT 

CTCAGTCTCA 

ATATCAATTA 

AAGGGATTTT 

GCTGTCTAGG 

TGTATCTTTC 

CAGGAAAAAA 

TTTTCAGATUV 

TTACAACGCA 

ACTGACTCAT 

GTTCATAGAA 

GAATAAAAAA 

TATAGCTGAA 

TCTCCTATGG 

TTCCTAATTA 

CTACAGAAGA 

CTCATCTGAA 

GGGTCAATTA 

TTTGGATTTC 

TAAAAAAAAA 

ATCGTACTTA 

ATACCCTTTC 

CAAATAAAAT 

AATGTCTGTA 

AAGAATCATA 

AAGTGAGTAT 



TTTTTTATGG 
TCAATTGATT 
CATAAATCTT 
ATCAAAAGTG 
GGATCATACC 
TTTACAATAA 
GAAATTACAT 
GCCATGACCC 
CATGGGAAGG 
GGTGCCTTTT 
GAAACTACTT 
TTACTATGGA 
CAATATGACT 
TAGGATGGGT 
TTGGTCTTCC 
CTCTGCACAT 
CAAGCCAGCA 
CAGCCTCCAG 
TATGGCAGCC 
AGACTAGGAG 
TGAGAAGTTG 
TAGCCACCTC 
GTCTTACCAG 
AAAACATTGA 
TTAACTGACA 
GTGGTGAAAC 
TAAATAAACT 
CCTGGAAATT 
CCCATGCACC 
TGAAGTCGAC 
TGGATACATT 
TTTTCTTGAT 
CTGCCCAAGT 
AACCTCATGG 
GGGTAAAGGG 
ATTTAAG.AAG 
ACAGAAACGT 
TAACCTCCAC 
ATGTTTTGGA 
CAACATGATG 
AAGAGAGGTT 
GTGATTTTCA 
AATGGTGGCC 
TGTGCAGTTT 
CAGAGAGGAC 
TTGTCTTGGA 
ATCTCCCAAA 
TTTTGTQTGT 
CACTTGTTAG 
TACAAAACAG 
ACTTQAAAAT 
AAAATTATTG 
TGAAAAGGGA 
GGTAAGAGTG 



ATGTATAATT 
CCCATTTTTA 
TGTTCAAATA 
AAAAACATTT 
AATTTATAAT 
ATTTTAAAAA 
TATTTTAATG 
TATAAGAAAT 
CACTATATAA 
CTGCGGAGGA 
TTACCTACAT 
CTGAATTGTC 
CTATTCCTAG 
TCCTAACTGG 
AAATTAAATA 
ATACTGAGGA 
AGAGAGCCCT 
AACTGTGATA 
CAAGCCAACA 
AGAGAAAAGT 
ACCTTGTTCT 
ACAGTCAACA 
GAGACAAATG 
ATCTCATGAG 
GTGATAAAAA 
AGGTGCTCAC 
GGAAGGAGAT 
CGATTGACCA 
AGGCACTGTA 
ATGCTCATGG 
GGGCCACATT 
TGGCAAGAAG 
ATGCAGGTCT 
TCATCTGCAG 
AAAGGGAAGT 
AAAGACTTTC 
CACCTTAAAT 
GTGGCTTGGA 
CATCAAGTCT 
TGGGTAAATT 
GAAATGTCAG 
GCTATGCTGA 
CTGATCTTAG 
ATAAATGAGT 
ACCTTCTCTG 
CATTGATTTA 
ATAGATGGTA 
GTGTGTGTGT 
ATTTTTAGAG 
ACAAATTAAA 
GAAGAAATCA 
CCAATCAAAT 
CTACTCATTT 
CTGTCAAGTG 
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22681 AAACCCTGCT AATCTCACTG AACATGTAAA AATCTGTAGA TGCCTTTATT TTATTCACTC 

22 741 ACACACATAT GTAGAAAGAG AAATATATGG TAAACATTAA AAAAACCAAA TTAGAATGTA 
22801 AAATTAATAC TTTAAAAAAT GGGCTGTATA CTTTTCTTAT CACCGGAGAT AAGAATTTAT 
22861 TATTTTTAAA ATAAAGTTAT TTTCTCTGTG ACTGTTTCCA TGACTTTGCT ACTTAGAAGT 
22921 TAGAGATGCC AAAGTTTATC TAAGAAAATG TTTATGGAAA TATTATTTCA ATAATGAATG 
22981 TTTAGAAGAC TGAATTTCCT GACTGGGCGC AGTGGCTCAT GCCTGTAATC CCAGCACTTT 

23 041 GAGAGGCTGA AGAAGGAGGA TCGCTTGAGT CCGGGAGTTC AAGAGCATCC TGGGCAACAC 
2 3101 AGCGAGACCC TGCAGCAAAG TAAAAAGAAA AAAGAATTGA AAAAGGAAGA CTGAATTTCC 
2 3161 TTTGGGCAAG TCATGTGACA TTCCTGTGCC TCAGTTTCTT CATCTATAAA GTTAATTCCT 
23 221 ACATTTTTGG GGAAGGGAGA GAAAAACTTA GGATAGTGAC TGGCACAGAA GAAGCACTAT 
23281 ATACTATATA TATGTGGATA TCATTTGTTT TTATGGTACC ATTTTAGCTA TCTAATGCAA 
23341 AATATGAATC ttTTTTTTCT GGGTCTTAAA TTATGGAATG TAAGAATTTT CTAAATTCTC 
23401 TAATTCTGTG TTAGTTTTAA AGCAATGGAG TAACGTATCT GTCAACTTGT AAATATAAGG 
23461 ATCAACCTGA TCCACAATTT GACCCCTAGC CACTAATATT TAATAGTACA ACACTCAGAA 
23521 ATTATCAAAG GTCAGAGAAG CCAAACAAAT GTAAAAACAT ACAGGTGCTC AGAAAGATGC 

23 581 ACCTGTAATC TCTCTAAGGA GAAATATTTT CCAAACTGAG TGACACGGTG CTTTAGTGAG 
23641 TTGTGGAATC AATCTCATGA TTTCCAACCT AGTGTTCTTT TAAAAATGAA CTAGTCCACA 
23701 GTAGAATATA CTAAAGTGCT GGTGCTTAAG ATAGTATTGT TTTCTGGAAA AAAAAAAAAA 
23761 ATTTTTTTTT TTTGAGACAG GGTCTCGCTC TTGCCCAGGC TGAAGTGCAG TGGCACAATC 
23821 ATGCTCACTG CAGCCTTGAC CTCCTGGGCC CAAGTGATTC TCCCACCTCA GCCTTTTGAG 
23881 TAACTGGGAC CACAGGTACG TGCCACCACA CCCGGGTAAT TTTTTAATTG TAGAGACAGG 
23941 GTCTTGCTAT GTGCTTAGGC TGGCCTTGTG AACTCCTGGG CTCTAGTGAT CCACTAGCCT 

24 001 CAGCCTCCCA AATTTATGGG ATTATAGGCA TGAGCCACCC TACCTGGCCT GTTCCCTGAA 
24 061 TTTTTTTTTC TTTCAGGTGT TTGTGCATAT GTGTGTGTGT ATGGGTATAA CAGAGAGACA 
24121 GAGAGAAAGA AACTTTTCTA TCTCACTTTG CAATCAGAAG TTTGAAGTCT TATCTTTTGG 
24181 CTTTTGTTTC AGAAATATTT CAAATGTAGA CTCTCTCCTT TACCACACTG TCCCCTTAGG 
24241 CAAGGTCTTT GCCATTCTTC TGAGACTATT GCAACAGACT CCCAACTTCT GACTGTGGGC 
24301 CCTTCTCAAA AATGATTGTT TATGCAATAA ATCTAAACCC AAGACAACTA CAACAATACA 
24361 ACAAATTCTC TGCTTAAAAA CTTCCAATGT CTGCCGGGCG CGGCGGCTCA CGCATGTATT 
24421 CCCAGCACTT TGGAGGCAGA GGCGGGCAGA TCACTTGAGG TGGGGAGTTC GAGACTAGCC 
24481 TGGCCAACAT GATGAAACCC CATCTCTACT AAAAATACAA AAAATTAGCC AGGCATGGTG 
24541 GTGGGCGCCT ATAATCCCAG CTAATTGGGA GGCTGAGGCA GGAGAATTGC CTGAACCTGG 
24601 GAGGTGGAGG T7GCACTGAG CCAAGATCAC ACCATTGCAC TCCAGCCTGG GCAACAAGAG 
24661 CAAAACTCTG TCTCAAACCA AACCAAAACA AAACTTCTAA TATCTACCAA ATGTTTCACA 
24 721 CAAGTATTTG GGGATCTTCA CAAATGGCCC TTATGGAGTT TTCCTTTGCT GAGACCCTAT 
24 781 GCTCTGGCCA CACTAAACTC ATTCAGCATC CCAGAAAGGC CTCAGCCTTT GTGAGCAAGC 
24 841 TCTTATCTCC AGGCCTCTCA CAAAGACCTG TTCCAGTAGA AGCTCAGGGG AGCACACTGG 
24 901 ACATTATTCC AACAACCCTT TCCCCACAGC TATGCAGCCA AATCTGCCAG CTCAGTTAAT 
24961 TAATTAAGCA ATTCAGAGAT GAGGGTCTGC CCAGGCTGGA GTGCAGTAGC TGCGACCTCA 
25021 AGCTCCTGGG CTCTAAGTGA TCCTCTTCAG TCTACCCAGA AGCTGGGACT GCAGGCATGT 
25081 GCCACCACAC CCAGCTAATT TTTTTTTTTT TCAGTAGGGA CCAGGCCAAC CTAGTCTTGA 
25141 ACTCCTGGCC TCCAGCCTTC CGAAGTGCTG TAATTACAGG CATGAATCAC TGCGCCCAGC 
2 5201 CAACCCGCCC AGTCTTGTTA GACATGGGGT CTGTAGTTTC TAGTAGGTTC TTGAGTCTAG 
25261 GGTTCCTACC TCATGTTTTA TAGTTAATTT AGGGGAGGGA CTGTGTCTGT TTATCTGGGG 
2 5321 ATGTAGGGGT GGGCAGGGGG ATAGAGGGGA CTTCAATTAA TGAAACCAGA AGCAAAACTC 
25381 AGTTGAGGAC ACCGGTCATG AGAGTGGCCT GATTATGGCC AATCTTACAT AATGTGTGAG 
25441 ATCTTGATAT TACCCCATCC TTGAGAGTCC TCTATAAAGC TACAGGGACT TGGGAGCACC 
25501 TTTAATTACA GACAACCCAT GTTCCTGTGG ATTATGATTT ATTAGATTGC ACATGCCTAA 
25561 ATAAAGACAT CCTCTGCAGT CTTTTGACAA TTCTATAAGC ATCTTCTGAC TCCGCAATTA 
25621 GACAGCTAAG AGATCTGTGT TACTTCCCTC ACATATATAA ATAATTTTAA ATAAAAATCA 
25681 TGGCGTGAAT AATTTCTTTC CTCTACCGAT TTGAAGCTAT CCATTTGGAA GACCACTCTG 
25741 AAGAGATGAA ATAAGTCTTC TGCCAAAGAT TACTTATTAA TTTACAAGGA AAAGGGGAAG 
258 01 TTTTGTTCCT CTCC GTGAAT TTGATTGAAA ATCGAGGGCT TTCTCGAATA GTTTTGGCAT 
25861 CCAGGGTCAT TTTTCATTAA AAAGAGAAAA GTCATGTCAA ATATGAATTT CCGCAGATTA 
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TTCAGCACTA 
GGTAAAACAA 
CAATCACAGC 
TCGCTGGTTA 
CTGCCAGTGC 
CGGCTGGCTT 
CCGAGGCCCT 
CATTGGCCGC 
AGAGCTTAGT 
TTAAGCTTAG 
CTGCCAAGAC 
CCAATAAGAG 
AGGCTAAAGG 
AGTCAAAATT 
GAGCTTTCCG 
TATTTTAAGA 
TTGTGATGCA 
AGACCATCCT 
CCACCGACCG 
GGGTGAGCGA 
TAGATTCTCA 
GCTCCTTGTG 
CATTGCTCCC 
TCTTGTGTCA 
GTTGGCACGC 
TGCTACCTCA 
AAAAAAGTTA 
GGAACTAAAA 
TACATGTAAG 
ATTTGGCATC 
CTAGTCTTTC 
AGGGATCAAT 
GTTGTTGTTG 
TTCCATTGTG 
TGGAAATCGT 
TTACGGTCAA 
ATGAATTACT 
AGCATACCGA 
CCCAATTAGG 
AGGTAAACTA 
GTATTCTTAC 
ATTTCGAGAA 
ATTTATGTTT 
GGCATCTTTT 
GTATAGATAA 
ATGAGTTCGG 
AGCCATTTTG 
AGTGAGGATG 
CCCTCAAGAT 
TGCCTGGCAT 
GGGGATTATT 
CCAGGCTAGA 
GCAGTTCTCT 
GCTAATTTTT 



GACCCTGGGA 
ACACAAATAC 
GCGCCCTACC 
TTACATCTTG 
TGGTGTAGCC 
GATAAGTGCA 
TTCAGTGTCA 
TGCTGGCTAC 
GAACAAGGGA 
TAAGAAGGTG 
CAAGAAGCTG 
AGCCAAGAAG 
AGCCAAGGGT 
GACCCAACAT 
GGAGGCCAAT 
TGGCGTAACA 
GCTGAGTTGA 
GGGCAACATA 
GTAACCGGTC 
ACATTAACCA 
TAAGCTCAAA 
ACAATCTAAT 
AGCCCCTGCA 
AAAAGGTTGG 
TCCCTTAGTC 
CTCCAGCCTG 
AAACAGAAAA 
AGTCTGATGT 
AGCATCTAAG 
CAAACATAAC 
TGTGGTGTCA 
AAATAGGAAT 
TTGTTGTTTT 
TGTGACTGAT 
GCTTGCTTAT 
GTGGTTTGAT 
TTAAGTATCT 
AGACTGAAAA 
TTCTGAATTC 
CGTTTCTCTT 
TGATCATCAT 
ACTTTGAACA 
TCCAGACGGT 
TAAAAATTGT 
AATCAACCAC 
AATTACTAGG 
CCTAAATGCT 
AACT^CTAGC 
GGCTTCCTGC 
ACATAAGGTT 
AGACCACTTT 
GTGCAGTGGC 
GGCTCAGCCT 
GTATTTTTAG 



GATTCTGTAA 
TCCTCCTCCA 
CTATATAAGG 
CGTTTCTCTG 
GCTATGGAGA 
AGTCGCAAAG 
CAGGAACGAG 
GACGTAGAGA 
ATCCTGGTGC 
ATTCCTAAAT 
GTTTTATCCA 
CCGAGAGCGA 
AAGCAACAGC 
CATGAAGTTA 
TTGGAAAGAA 
CTGGAAACAA 
AAAGGCTTGA 
GCCAGACTAC 
CCTGTCCATG 
ACTGAGCTCC 
CTGTATTGTG 
GCCTGATGAT 
CCCCCTGGTC 
AGACTACTGG 
CCTGCACCCA 
GGTGACAGCG 
AGGGCTTCTT 
CCAATCCTGA 
TTCTGGAAAT 
TTGCTGATAC 
TTGTAACTAT 
CAAGGTGTCC 
TCATCTATTC 
AGAAATAACA 
TTCCGAAGTA 
AATTATTTTA 
TATTTATGAA 
ATTTTAAGAA 
CACCTTCCTG 
TAAACAGACA 
AAATAACCAA 
AAGTCCCCTG 
TCAATAGTAC 
GTCCTATAAT 
AGACCTTTCC 
ATTGTGCAAA 
GTGCCCAGCA 
CTCTCCCAGC 
ACCTTTGCTC 
AAAAACAAAA 
TTTGTTTTGT 
ACAATCTCGG 
CCCACGTAGC 
TAGACGGGGT 



AGAGGGGTTT 
AGGGGCGGGG 
CCCCGAGGCC 
TTGTTATGTC 
AACTTCCAAC 
TGCCGAACCT 
TAGGTATGTC 
AGAATAACAG 
AAACCAGGGG 
CTACCAGAAG 
GGGACTCCAA 
CAACTCCTAA 
AGAAGAGCCC 
ATGTTAGAAA 
CCCAAAGGCT 
GTTTCTGTGA 
GATTGGAGAA 
CATCTATACC 
GCACGTTATG 
ACCGCCTGTC 
AATGGCACAT 
CTGAGGTTGG 
CGTGGTATAA 
TTTTACAAAA 
GGCGTTTAAG 
AGTCAGACGT 
GTCAGAGACT 
AAAGCTCGAT 
GCCAGTGTCA 
TTTTTTTTTT 
TGTTTCTTAA 
CAGAATATGG 
ATTATCCTGT 
AATTTGTAGG 
CTATTAGGTA 
ATATTATTGG 
AAGAATCTGT 
TCCAAACCTT 
AATCACAAAC 
TAGTTTAATT 
TGCTAATGTT 
CAAAACTATG 
CTCACTTTTC 
GAAAGGTTGT 
TTGCTTGGAT 
AATATGCCTC 
ATGGACTGTC 
AGCTGGCCGG 
CTCTAGCCTT 
TCAATAAGTT 
TTTGTTTTGG 
TTCACTGCAA 
TGGGATTACA 
TTCACCATCT 



TGTTATACTC 
GCGGTGCCTA 
GCCCGGGTGT 
TGAAACCGTG 
CAAGAAGCGA 
CTCTGTGTCC 
TTTGGTTGCG 
CCGCATCAAA 
TACTGGTGCT 
CAAGGCTAAA 
GTCACCAAAG 
AACTGTTAGG 
AGTGAAGGCA 
GGCCACATCT 
CTTTTAAGAG 
CAGTTATCTA 
TTAATTCAGG 
AGGGGTCCTC 
AATTGAGCCG 
AGGTTAGCTG 
GCAAGGGATC 
AGCAGTTTTA 
TTGTCTTACA 
AAGTAAATTA 
GATACAGTGA 
TGTCTCAAAA 
GCCGTATATC 
GGTGCACTAG 
GGGAAGGGAA 
TTTAACACAA 
TATGCTATCC 
ATTAGGGGAG 
AGCTGAAATT 
TTATAGTTGT 
TATCAACAAA 
TCTAATACAA 
AAGTTTCATC 
AATGGAAATG 
TTGTTTTAAC 
TTCCTTTGAT 
AGTCTACTTT 
CATTGCATTA 
TQAACTTATT 
AAACATTATG 
GTAATTGCCA 
ACTTGCCTGA 
ACCAGATTCT 
TCTCTCAATA 
GTATGTATAC 
ATGGTTCTTC 
ATGGAGCCTC 
CCTCTGCCTC 
GGTGCCCGCC 
TGGCCAGGCT 



AACTTTTCCG 
GGTGATGCAC 
TTCATGCTTT 
CCTGCAGCTT 
GGGAGGAAGC 
AAGTTGATCA 
CTCAAGAAGG 
CTGTCCCTCA 
TCCGGTTCCT 
AAGTCAGTTT 
ACTGCTAAAA 
AGCGGGAGAA 
AGGGCTTCGA 
AAGAAGTAAA 
CCACCCACAT 
TAGGTTTAAG 
CCAGGCTTCA 
ATTTCCCCGG 
CACAGCTGAG 
CAGCATTAGA 
TAGGTTTCAG 
GTCCGGAAAT 
CAAAACGGTC 
GTCAAGCATG 
GCTATGATGG 
CTTAAAAAAA 
TAGAGGTCCA 
AGGAGGCTTT 
GTGGAGAGCA 
GTACTACATT 
ACTGACTTCA 
TTTTTTTGTT 
TAGAATTTTC 
TGCAAGAATC 
AACACACATA 
TTGTAACCCT 
AGACTACCAG 
TTGGAGGCTG 
TCTCAGTCTG 
TTTTGATTTA 
GGACCATGGT 
TTTCACATAC 
TGTATAGTTT 
TTTTAAATTT 
TTGTTTCCCA 
CATAGCAGAG 
CATCACATAC 
ATATGGGACT 
AAGGCTAGCA 
CTCCAGTTCT 
GCTCTGTCAC 
CTGGGTTCAA 
ACCACGCCCG 
GGTCTTGAAC 
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29161 GCCAGACCTC GTGATCCACC CACCTTGGCC TACCAAACTG CTGGGAATAC AGGCGTGAGC 

2 9221 CACCGCGCCC GGACTTAGAC CACTTTGTTT TGGCCAATAG GACAACAGCC ATAGAACCCT 

29281 CCGCAAATGA GAGCTTGTCC CTAAAGATGC TTTATTTACA TAGCTGTGTG CCGCATGAGC 

2 9341 CAAAAGGTGA TAACCTTTGT TCAACACGCG CCTCCAGCCC TTCGGTTAAG TCCAAAGTAC 

29401 CATTCTTAGA ATGCTCTAAA ATACATAATT TTTrTTTTTT TTTTTTTTTT TTTTTTTGAG 

29461 GAGTCTCTCT CTGTCTCCCA GGCTGGAGGG GAGTGGCGCG ATCTCGGCTC ACTGCAATCT 

29521 CTGCTTCCGG GCTAGCTGGG CCTACAGGTG CAGACCACCA CGCCCGGCTA AGTTTTGTAT 

29581 TTTTTTTGGT AGAGGGGGTT TCACCATTTT GGCCAGGCTG GTCTCGGATT CTTGATCTCA 

29641 AGTGATACAC TAGCTTTGGC CTCCCAAAGT GCTGGGATTA CAGTCGTGAG CCACTGCGCC 

29701 CAGCAAAATG CTTTTTGTGG AGCCAATCAC TTTATTAGCG CTTACCTCTC TATGCCTACT 

2 9761 TTATGCTTTG AAATTTTGTC ACAGTGTGGC CGGTCATGGC AAACACAATT CATTCTTATG 
29821 CAGGATGTCA CGGTTATTTC TGTCATCCAA ACTCATTCTC GCAACGCATT TCAGCTCTTT 
29B81 AAACGACTTT GTGAGCGGCC CTGAAAAGGG CCTTTGGGTT TTTTTGTTTT TGTTTTTTGA 
29941 AGTTCTCAGG AGACCGCGTA TTCTTAGATT CAGCCGCCGA AGCCATACAG AGTGCGCCCC 
30001 TGACGTTTTA GGGCATATAC TACATCCATG GCTGTGACAG TTTTGCGCTT GGCGTGCTCC 
30061 GTATAGGTGA CGGCGTCTCG AATAACGTTC TCTAAGAAAA CCTTAAGCAC ACCTCGAGTC 
30121 TCCTCATAGA TAAGACCGGA AATGCGCTTG ACGCCACCGC GCCGAGCCAA ACGGCGAATA 
30181 GCCGGTTTTG TAATGCCCTG GATGTTATCC CGGAGCACCT TACGATGGCG CTTAGCACCA 

3 0241 CCCTTCCCCA AGCCTTTTCC GCCTTTGCCG CGACCAGACA TGATTCCTAT CGCAGTGGAA 
3 0301 GGTATGAACT GAAACAGTTC CTTAAATACA AACTTGGCGG ACCTGATTGA AAACAACATG 
30361 AGTTGGCGCG GTTTTTTTTT TTTTTCAAAT TTGGTCACCA AGTGGGTGGA GCAAGAAAAA 
30421 CTGTTTCATT ATGGTTCATT GTTTTGATTG GCCAGTGACA GCTTGCTCTT TGTGGGAGTG 
30481 GAAGGGTGTT TGCAAGTTGA ATGCGCTGTA TTCCTGTCAG CTTAATGACG CTAAGCATAG 
30541 CCCCATTCCA CATTTCTTTT TATTTCCACT TGCTAACTAA TAAATTACGG AATAGTTTAT 
30601 TGGGGAACAT ACAAATAATG TTTAAAGGAG GTCAGATTTA TAGGTCAAGG GATTTACCCT 
30661 CCCAATCATT TTAATATTTT TATTTAAACC AGGCATTTTG ATGGCCTTCT CTGTGCTGGA 
3 0721 CAAGGTATAA GTTTGGCTAT GAAGTTTCAC TCCTAAAGAC CCTATGTTTT GGGAAGGCAA 
30781 AAAGGTAGCC AAATAATTGC AAATTAAAAC CTCATAAGTG CAAACTTCTT CCTCGTCACT 
3 0841 TTCCCTATCT CGATTCAAAT ATTTGTTGAA TGACTCATTT TTCTGCAAAA GTCTGAGAGA 
309 01 GACAGGGAAT ATAAACTTAA GTCTGGATAA TATGTTTTCC CGGGACGCTC TTCCTGGTCT 
3 0961 GCTGTGCCTG TTTGCTGTGC CTGAAATTCC AAACACTCTT CCCTTCCCTC CGTTTTTAAT 
31021 CCCCTTTCAA CTTGCTACAG CTTTAGAGAA AAGAACATTC GTTTTGTACA GTTGGGGATT 
31081 AATTGAAGTG TAGGGCTAAT ACTTGATTAA GGTCATTACA AAATCTACAG GGTCTTCCTC 
31141 TGGGAGGTT7 TTGTGATAAG ATTATTGGTG TTAAAATAAG GCTAATCCCC TTGAAAAATA 
31201 AATAGAATAG CAGAATTGGG TCTGAATGTG GTTTGAAGAA AGGGACTTCT CAATTCAAAA 
31261 TTTTATTCTT AGCTTCCTGC GGGAGCTTTC CAGAATGCCC ATAAGATCCA CTTTTGTTTA 
31321 AAAAACAAAA ACAACCCCAC CCACCACTCT CTGGTTAATA AATGAATTTC TATTGGGAAT 
31381 ATTTAGAATG GGGCTGTGGC CTGTGAGAGA CATTATATAG TAACCTCAGA CTTGCTCACA 
31441 TGAAGAGAAG AAATCCAGGA ATGGAGAAAA AAGACCCAGG AAAGGCCAGA ATGCTCTACA 
31501 TGTCATATTG TTTGTATCAC TTCTGAAATA ATTGATTACA TTCTTCTGCC CCAAATTGAG 
31561 TTCTTAGGTT CTTCCACTCA CTGTCCACAT GCCACAACAC AGACCTTATA ACTAGAGACT 
31621 TAGCTAGGAA GAAATGTCAA ACATTACAGA GAAAAAATGC AGAGTCTGAG ATCATAAGTA 
31681 AAACTCTGAA ATCTCAACAT GCCTTTTAAT TCATGAAAAT AAAAAATATA GCAGCATATG 
31741 CAATATGACA ATTCTCTGAA AACATACATC ATGTGAACTA CCCTGGAACA CATCTCGCCA 
31801 AGTGCCATCT TCATTTTAAC CAGAGGTCTA GGATGCCTTT CCTTTATTTT GCCTATTATA 
31861 TCATTTATAA AACCCCATTT TTATTTTGAT ATTTTATTTA CTTTCTATTT CCTGCTCCTA 
31921 ATATCTCCTT TCTAAACTTT TCTCAATGAC AGT6ACTCAA AAACAATGAA TGTCAGAACA 
31981 AATATTTAAA GGATCTGTAC ATGTAGATAT ATATATTTAA AATGGATTCT TCCACTCTGC 
32041 GAAGAATTCA GGCATACTCA ATCTTATGGT TAGGGAGAGA TTAGGCTCAC TCGCCTAATC 
32101 TGTATGGCTT CTCGTTCGCT TTCCATTTCA CCTTCCTCTC ACCCATCAGA TCAAACTCAT 
32161 TCATTGAACA AGAGACCTAA GCCCTTCAGA TTAAAACTCT GCAAACAAGT TGTGGTTGAG 
32221 AGGATACATG AAGCATTCAA ACAAATAAAT CTATGATATT AATCAGAGGT TAATCTATGA 
32281 TATTAATCAG AGGTTAATGC AGTGGCTCAC GGCTGTAATC CCAGCACTTC AGGAGGCTGA 
32341 GTTGGGAGAA TCGCTTGAGC TCAGGAGTTC AAGACCATTT TGGGGAACAT AGCAAGTCTT 
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32401 CATCTCTACT TAAAAAAAAA TAACCAGAGG TGTTATGAAA ATATAAATTG TCCAGAACTA 

32461 CCCTCCACAA ACTAACTCTC TCAGAATATT CGATATGAGG AATGAAATAT GGTGTGTGTG 

32521 TGTGTGTGTG TGTGTGTATG TGTGTGTGTG TGTGTGTGTA TGCACCTATA TATGGCACCT 

32581 ATATATTCAA CAAACAATTC TGATAATTGG CCAGGGTTGA GAATGACTAG CAGCCCAGCA 

32641 TACACTATCA GTTTTAAGTA TATAATTGCG CTTTAGTAAA ATGTAAAGAA ATCCCAGAGT 

32701 AGAAATACTT TTAAGCTATA TTACAGGTGA GAAAATGCAT AAGTATAGTC TCACCCAACT 

32761 TAGACTATGG GGGCTTTATA ATGTCACAAC AGTTGTTTCC AGGCATTTGG GGACATCACC 

32821 ACTGGTCTTG GGCAAGAAAC TCCTCTAGCC AATGGCTGAT TTATCTCACT CCCATCTAAG 

32881 GCTTCACTGC ATTTCTCTTT TTCAGCAACC TAACTTATTT AAAAATATCC ATTTTCTGAT 

32941 TCATTTTTTT CTGAATTAAA CTGTCAGTAC CATTGGCACA CCTTTGGTTC CGTAGCATAC 

33001 CTGTGTCTCT GCTGTGTTTT TTTTTTACCT CCACT.CCTTA CTTTTCTAGA AAAAAATCTC 

33061 TGCTTTTTCT TTTCAGTTTA AATTATTTCA CAAAAAGTTT TCTTGACTTG CACTTCCTAG 

3 3121 GCTTGCTGTC CTTGTGTGGG CACGCTCCCA TAAACACTAT TAATACACTT CGATTTGTTA 

33181 AAAATAAAGA TATCTGGACA GAAAATTTCT TTTCTTTTTT TAAGATTTTA AAATTTTTAA 

33241 TGTTTATTTT TTTCCTAGAC TGGAGTACAG TGGCACCATG ATGGCTCATG GTAGCCTACA 

3 3301 CTTCCCCGGG CTCAAGTGAT CCTCCCACCT CAGCCTCCCA AGTAGCTGGG ACTACAGGTG 

333 61 TGCACAACCA CACCTGACTA ATTTTGTTTA TTTGTTTGTT TTGTTTTTTG AGATGGAGTT 

3 3421 TCGCTCTTGT TGCCCAGGCT GGAGTGCAAT GGCGGGATCT CGGCTCACCG CAACCTCTAC 

3 3481 CTCCCAGGTT CAAGCAATTC TCCTGCCTCA GCCTCCCGAG TAGCTGGGAT TACAGGCATG 

3 3541 CATCACCACG CCCAGCTAAT TTTGTATTTT TAGTAGAGAC GGGGTTTCTC CATGTTGAGG 

3 36 01 CTGGTCTGGA ACTCCTGACC TCAGGTGATC TGCCCGCCTC GGCCTCCCAA AGTGCTGGGA 

33661 TTACAGGCGT GAGCCACCAC GCTCGGCCAC TAATTTTGTA TATTTTGTAG AGATGGGCTT 

3 3721 TCCCTGTGTT GTCCAGGCTG GTCTTGAATT CCTGGGCTTA AGTGATCTGC CCACCTTGTC 

3 37 81 CTCCCAAAAT GCTAGGATTA CTGGCGTGAG CCACCAGGTC TGGCTGGAAA GATAATTTCT 

33841 AACATTATCC TCTCTTAAAC ATTTGTTTCA AAAATTTTAC AAACATGAGA GTAATTAAAT 

33901 TTGATTTTCA AAATTCCCTT GAATACTTTC TTAATAGCAC ACAGAAAGCA CAAAGTATTT 

33961 TACATTTGTT TTAATGATGA AATTGTGAAC CCAAACTTAC ACAAAGAAAA ACCCGTAACA 

34021 TTATACCCAT ACTTAAAACA GATGCCCTCA TATACATAGT AAAACTCTTG GGGGCAGTAG 

34081 TGAAGTTGGT TATTTACTGT TTTATGAAAG TGCCATTCAG CCGGGTGCAG TGGCTCATGA 

34141 CTGTAATCCC AGCACTTTGG GAGGTCGAGG CAGGCTGATC ACGAGGTCAG GAGTTCAAGA 

342 01 CCAGCCTGAC CAAAATGATG AAACCCTGTC TCTACTAAAA ATACAAACAT TAGCTGGGCG 

34261 TGGTGGTGTG TGCCTGTAGT CCCAGCTACT CAGGAGGCTG GGGCAGGAGA ATCGCTTGAA 

34321 CCTGGGAGGC GGAGATTGCA GTGAGCCGAG ATCGCACCAC CGCACTCCAG CCTGGGAGAC 

34381 AGGGCGAGCT CCGTCTCGAA AAAAAAAAAC AAAAAAGTGC CGTCATAGTG ACTCAGTTTT 

34441 AAGGAATAAA TCAAGGATAT TTAACTCAAT AGACTACAGT TAGCTAACGT GACTTGCACT 

34501 GAAAGTTATA CGAATATTGG TACTTATTCC CCTGCCCCTG AAGTATGAAT TAAAGACTCC 

34561 AAAATTCTTT TTAGAATGTT CAGAGTAAAA GCTAGAATTT GATTTTTTTA AATAATAAAA 

34621 AAATACTTTG TATCTAAATC TGGTGTATAA AATAACTTGG TGGATGATGC TTCAAGGCTA 

34681 TCCATCCCCA AATTTCTCCC TGAATGATAA AGAGAATAAA TGAATATGTC AATTCAAAAG 

34741 TTAGAAATTT GGCCGGGCAC GGTGGCTCAC TCCTGATAAT CCTTTCGGAC GCTGAGGTGG 

34801 GTGGATCGCA TGAGCTCCGG AGTTCAAGAC CAACCTGGGC AACATAGCCA GAACCCGTTT 

34861 CAATAAATAA TAGAAAAAAA TGAGCCAGGC GTGGTGGTCC CAGCTACTCA GTAGGCTGAG 

34 921 GTGGGAGGAT CACTTGAGCT CAGGAGGTCG AGACTGCAGT GAGCCGTGAT CGCAGTACTG 

34 981 CACACCAGCC TTGGTGTCAG ACTGAGACCC TGTCTCAACA ACAACAAAAC AAGTTAGAAA 

35041 TTTGGCTGGG CGCGGTAGCT CACGCCTGTA ATCCCAGCAC TTTGGGAGGC CAAAAAGGGC 

35101 GGATCATTTG AGGTCAGGAG TTCGAGACCA GCCTGGCCAA CATGGTGAAA CTCCATCTCT 

35161 ACTAAAAATA CAAAAAAAAT TAGCCGTGCA TGGTGGCATG CGCCTGTAGT CTCAGCCACT 

35221 TGGGAGGCTG AGGCAGGAAA ATTGCTTGAA CCCAGGAGGC AGAGGTTGCA GTGAGCCGAG 

35291 ATCATGCCAC TGCATTCCAG CCTGGGTGAT AGAGTGAGAC TCCATCTCGA GAAAAAAAAA 

35341 AAAATTCTGT ATGAACTGAA CAAAATATCC TTAAATTTTA AAATACATCT GAAAGATATT 

35401 TCAAAATATT TAGGAAAAAA ATTATAGGGA TCAGGCAAAT TCTGAGATTC CTTTTTCCCT 

3 5461 GCAGCAAACA TTAGGAGTGC TGCTGTTCCT AAAAACATGG TAACTGTTGC CACACCGTAT 

3 5521 GTTTCCTTGG CTCAGACATA AGGTTGTGTA GTTGTTATTC CAGAATAGCT AGAATAAAAA 

35581 TCCAGCACAT CATTTTCTTC AGCAAGTTAA CTAACCTCTC TGTGCCTTGG TTTCATAACA 
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35641 GCAACATAAG CATAACAGAA TAGCAGCAAT AGCTCCTACC TACCTCATAA GATTCTTTGG 

35701 AGGAATTAAA TTAAGATTCA GAACACAGCC TAATATCTAG TAAGTAATAA TAATTGGCTA 

35761 AAAAAATTTT CTTAAGATTA TATATATTCA TGGGGTACAA GTACAATTTT GCTACATTAA 

3 5821 TATATTGCAT TGTGGTGAAA TCAGGGCCTT CAATCCATCC CGGAAAAAAA AAGTTTTTGA 

3 58B1 AAAGATTTCT GCCATGGAAA ACTrTTAATG TACAAATTCA TCCATCCAAG AAATAGAAAA 

3 5941 TATATAAGTA TCAACTCCAA ATCCACCATA TCTATCTCTT CTACACCTTA AACAATTACT 

36001 CAGAAATAGA ATGCTTGAGA TACCAGAATG CATGCATATC AAGTAATAAA TGCATGCAGG 

36061 ATGTCAACGC ATCCTAGGCT TTCAAATAAA ATTGTCATAC AAAATACTTT AATATTGTAG 

36121 TAACATTCTA CATGTTAGAG TGTAGAAGTT AATCGCTGAT GCAAAAAAGG AAAAGAACAC 

36181 ATTATACCCA AAGCCTACAG AGAGAATCAC AATTACAAAT ATCAGCCTGC ATGTGAAAAT 

36241 CTTTAATTTG AAAGTCAGAA ATATTTAAAT GATAGTCATT GTTAAATCAG ATTGTGGTTT 

36301 GAAAAAAAGT TAGTTTAAAA CTGAGTTTAT GAAAAATTTG GGGATTTTAG AGACAGTGTT 

36361 TTGTTTTTAA ATGTGTGTGA GTTTGTGAAG AATGTTTTAT AAAATACTGA CAGTATTATA 

36421 AGATGACATT ATTATAATAC AACATAAGAA TTTTGGCCTG TACCTCTCAG CAGTCCTCAA 

364B1 TCACCTGCTG TACTTGACTC AATGATTATC AGAGTGGTTT GTTTTCCTTC TGTTGTGTTC 

36541 CCAGTTCAGG CAGCTCAGCA ATGGCCTGTG ATTCCAGCAA TTCAAATAGC TGGTAAGTAG 

36601 TTTCTTGTTT GTTTTCTCAA ATTTTCAGGG GCTTTTCTCT ACAAGTGATT TCCAGTGCAC 

36661 GCCCCTCCAC CCATTCTTTA TTCCTTTACC TTCAGGAAAA CCCTCAGCGC TGCATCTCTG 

3 6721 GTCACCGGAC CACCGTGGTA CATTTACCTA TGGCCACCAG GTGTCACCCT TCTCTTTACT 

36781 ACCATGGTTT GTGAATGGTT TTGCCAGAGG TGAATAAGAA TTTAAAATGC AGGTCTTTGA 

36841 TTTTTCAAAT GTAGTTGACC TTAAGAATTT ATGAATAAAG CCAGAAAAAT TAAGCTTAAA 

36901 AAACACCGAA AGAAAATGAG GACTTAAAAT TTCTATTAAA AAAATTAACA GGCCACAGTT 

36961 GCTGATGTTT AGTAAATGTG rTAGTGAAAT GTGTTACTGT GAAGACTGGG GTGTTTCTTG 

37021 AAATCTCAGC CCAGGTGAAA TAAAACCAAT ATAAAACAAA TGCTTACCTA ATAAATTAAT 

3 7081 TGTAACATAT TCCTTATGAG GTAGAAGAGT AAGTGAAGCC TTATAGCAGT CTGCTTTCAG 

3 7141 TATAGTAAGA TATTAAGAGA GAAATAATTT GTCATATGCT TTCAGAATGG TTTGCTGGTA 

3 7201 AAATAACCAA TGTCTTACAA CTTAGACGAC AATGTCCCTA GAGTGAAGAA ACACGATTAA 

3 7261 TTCGGCTACC ACAGTTGAAT GAAAATATTC CGTAAGACAA AATGTAAAGA AATTAGAAGC 

37321 AAAATAAATG TCTCCAAAAT GACAAAGCGA TTAAGTATAT ACACAAGATG AACAAGAACT 

37381 TCAATAAAAT CATGCAGTAT ACAATACAAT ATACATTTAT TAAAGTATAT GCATTTTTAA 

37441 TGCAACAATA ATACTAACAG GTAATAGACA AGTTGTTAAT AGTTTTTCAC TGGCTAATTA 

3 7501 AATAACAGCT TTAATTGTAT TCATTTTATA GCTTTTCTAC AATGAGCGTA AATCACATTT 

37561 ACTTTTTTCT ACATAACTTT TCTAACCACA AAAAAAGAAA ATGGTTTAAA AGAAGAGATG 

37621 AGATATCTTT GCTAAAATTT AATGCCTAAA GAAGAAACTT CTGAGCTGTA TATGGTATCC 

3 7681 TGAAGCACCT GCCCTTCAAG ACAGAATGCT TGTACCACAT TTATGCAGCC AAGTGCATGT 

37741 AGTAACATAA AGTAAACACA TGCGATCTGG ATATATATAT TAAGACTCTT TTGACGGCTG 

3 7801 GGCAGGGTGG CTCACACCTG TAATCTCAGC ACTTTGGGAG GCCGAGGCAG GCGGATCACG 

37861 AGGTCAGGAG AGTTCGAGAC CAGCCTGGCC AACATGGTGA AACCCTGTCT CTACTAAAAA 

37921 TACAAAAATT AGCCGGGCAT GGTGGTGCAC GCCTGTAATC CCAGCTACTT GGGAGGCTGA 

37981 GACAGGAGAA TCGCTTGAAC CTGGGAGGCA GAGGTTACAG TGAGCCGAGA TCATGCCATT 

38041 GCACTCCAGC CTGGGCAATA GAGTCTCAAA AAAAAAAAAA AGACTCTTTT GAACATGGTG 

38101 AACTGATTTC CCAGAATCTA GCAATTCCTG AATGTCCTGG TTAGATTTTT TTTTTAATGT 

38161 GCACCGQAAC CCCAGTGGCT CCATGGAAGG ACCTGGGCAT CCTCTAAGCC ACTTGGTGGC 

38221 TTCCATTATA CCATCTCAAA ATGAGAGAGC TTACTCCACT TCATTGAGGG AAATACCACC 

3 8281 AGAGTTCTGA CTCCAGAGGC ACTGGCCTAG GGAGGACACC GTQTGTGAAG CCCAGCAGGG 

38 341 CCACTAGCTG TCCCCACCAA TTACAGTCCT TGCGTAGGGT CCAAAGAAAT GAATGCCAAA 

3 8401 GAGAGCAACA GAGGAGCAAG GGAGTCACAT TCCAGGACCT TCCTTCAGGG ACTTTTAAAG 

3 8461 GAAACATGAC AGCTGAGGAT CAGTTGGTTG TTTTCTGCTG TTCCCCTTCA TGTGATTCAA 

38521 GCTCACTCAG AAGAAACACA ATGAGACAAG AGAAGAGCCA TCTCCTTCCT TCTCTATTTA 

38581 TTCTAGGCAT CTAAACTACT GAATGTAGTG GTGTCTGAGA TGTATCAAAC GGTCAGATTG 

38641 ACTGAGTTTG AAACCTGTTT CTATCACTGA CAAACTATGA GATACTCTAT ACTTCACTTT 

38701 CTTTTTTTTT TCATTTTTTT ATTTTTATTT TTATTTTTTT GAGATGGAGT CTCACTCTGT 

38761 CACCTAGGCT GGAGTGCAGT GGCGCAAACT CGGCTCACTG CAAGCTCTGC CTCCTGGGTT 

38 821 CATGCCATTC TCCTGCCTCA GCCTTCCGAG TAGCTGGGAC TACAGGCGTC TGCCACCACG 
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38881 CCCAGCTAAT TTTTTGTATT TTTATTAGAG ATGGGGTTTC ACCATGTTAG CCAGGATGGT 

38 941 CTCGATCTCC TGACCTCGTG ATCCACCCGC TTTGGCCTCC CAAAGTGCTG GGATTACAGG 

39001 CGTGAGCCAC CGTGCCCGGC CTACTTCACT TTCTTCATTT AAAAAAGAAA TGGGGATAAT 

39061 AGTACCTATG TCATAGAATT ATTGTAAGAA GTGCATGCAG TAATGCATGT AAGTAGGTGC 

39121 TCAGAAGAGT CGGACACGAA GTAAGTGCTT TTATCATCCT TATCATAATT TTCATTATCA 

39181 GAACAAGGAG AGACCAGGTA GAAAATTATT GTGATTCTTC AGGTCTGGAA TACTAGAGTA 

39241 GCATCCCAAA TGAAGGCACC ATTAAACTTT GCAAATCTGT ATGACACCTT CATGCCAATT 

3 9301 AGAAAAAACA CCTCTTCACA ACCCCTTTCA AGATATTTGC CTCCTACCTG CTAAAAACAC 

3 9361 CCATCATACT ACCCACAGAT AGCCATGATG CTTTTTCTGG GACAGGTGCC TCTTCCATTC 

3 9421 GTGCAGTGTA CAGCCTTCAT AGCTGTGCAA CTCACATCAC AATCAGATGG AAGAATCCCC 

39481 AAGGCTTGGT GACAGATGAG TTACTGGGTA ACACAGAGAG AGGATTCAAA GGAAAAGTTG 

39541 AACGGGTCCA GAAAATGCAT AGATACATGT GTAAAAATCT GGTAAGGTTA TGACTAGCCA 

3 9601 CGTCCCAGGG TTCAAAGCTT TTCTCAGATG TTAAAATGAA TCATGTAAGT CCCCCAAATT 

39661 TAAGGAGTCC TCTTCCAAAA ATAGGAAATG AAATGACATA GGTGTATGTC TCTGAGGTGA 

3 9721 CGGAGGAAAT GAAGGAAGCC TCTAGATGCA GCTTGAGGTT CATGAGAGAC AGTTCCAGGG 

3 9 781 G AGAG GTCAC AGCTAGGGAT CACCGGCATG CAGGAACTCA GAAACCTAAA TGGGGAAATC 

3 9841 TTTTTGAGGA AATGAACAGA GAAGGCTAAA ATCAAGGAGT TCGTCAGGCA ATTTCTATGT 

3 9 901 TTAGGTTCAA CTCTCTCCTG AAACATGAAG AGCTCATAAA TGCACTCCCT CTTTGAGTCT 

3 9961 CTAGTTTTGT CTCCTTCCCA CAGTGAGTCT GCAGGCTGCG TGTCACTCAC GTTCAGCTAA 

4 0021 GACGTAGTGC CCCATGGCTC CTCCTGTGGA GACAAGAGAC CCAGGAAAGA GGCATCACAA 
4 0081 ACCTAGGCAC CATCTTGCCT CTTCTCTCTT CCTTATTTTC CTCATTCACC CATCTCAATT 
40141 TAGACCTGGG CACTATTGGA TTTCAAGAAC CATTATCTCT CATCTGGAAA TGCTTATTGG 
4 0201 CTTTCTAACT GGTCTCCTCA CCTCTCATCT AACTTCTTAA CAACACATTC ACCATATAAG 
4 0261 GGAGATCGTG GTCCTCCTTT CTTAGGATCC TTCAATGACA CCCCAGTGAT CATAACCCAA 
4 0321 TATCCCAAAA GACCCTTGGA CTCTGTATGA GCTGGCTTCT TTCTGATTCT CTTTTCCCTA 
4 0381 CACCACAGAT GTTCAGGGGG TAGAAATGCA TAATTGGTGA GTGATAGCTA CGCAAACTCA 
4 0441 GGGTTAAGGT ACAGTAATTA TTTCTAATCT CCCAGTATGC CTTATACTCT CCTACTTGGC 
4 0501 ATGGTTGCTC CGTCTGTGTA GACCTCCCAT CATCTTCAAC CTCACCTAAT GGAATCCAGC 
4 0561 TTCTCCTTCA AGATCCAGAA GGCTATCTTG ATCCCCAGCT GAATGTGATC ATTCTTTCCT 
40621 TTGACACCCT AAGCATTTGC TTCCTGCCTG CTTTAGGACC TCATGGGGTC TTCTTTAACT 
40681 ACATTTACTT GCTATCAATT TCATTCCCTA CCAGATTTGG GTTCTGAGAA TAGCCACAGT 
4 0741 GACTTCTCAA CCTCAAAGCC CCTGTACTAC CTTAAACAGC TCTTGCAAAA TAGTAGGTGC 
4 0801 TCTGAAGATG TTTGTTGAAT TAGAGACTTT CATTCTGGGG AGAACCATTA TTTTCTGTCT 
40861 CCCAGGGAGC TGCTGGTGTC CCCAAAGAAT ATAAATGAGA AAAATGCTTC CCATGGATGC 
4 0921 CAGATCCCCT CTGCCCCTCT TCCCACTGTG CCCTGGGGCA GAGGTACTAA GAGACTTCCC 
4 0981 CCTTGTTCCT ACTCACTTGA ACCCTGCCTC TTCCTTAATA TTATGAACAA AATTCCAATG 
41041 AACAAGATGA CGACAAAAAC AGCAATTCCA CTGATGACTC CAATGACTAG GGTGCCAGAC 
41101 GGTGAGGGCT CTAAAACAGA AAAAGCAAGT TAAAGCCTTT GATTGCCACC CTCAGCCCAC 
41161 CCCCTAACAA AGAGCAGATC CTCATCTCAC TGCCATAATT ACCTCCTCAG GCACTCCTCT 
41221 CAACCCCCAA TAGATTTTCT CAGCTCCTGG CTCTCATCAG TCACATACCC CAGATCACAA 
41281 TGAGGGGCTG ATCCAGGCCT GGGTGCTCCA CCTGGCACGT ATATCTCTGC TCTTCCCCAG 
41341 GGGGTACAGC CAAGGTTATC CAGCCCTGGT AGGTCCCATC CCCATTGGGC AATACGTCTT 
414 01 TAGGTTCGAA CTCCTTGGCA TCCATTGGCT GCTTATCCTT CAGCCACTTC ATGGTGATGT 
41461 TCTGGGGGTA GTAGTTCAAG GCCCGACACC GTAGAGTGGT CACTGAAGAG GTCACATGAT 
41521 GTGTCACCTT CACCAAAGGA GGCACTTGAC . AGGAAAGAGG AAGGATGAGG AGAGGGGATC 
41581 TGTTT ACCCT TGCCAGGAAG ACTGGAACTT TCACTTCCTT CTATAGGTTG GAGGAAGGAA 
41641 ATACCCTTTT CAGAAAAAAA CAAGCTACAG GAGAGACACC ATTTTGTGTC CTAAGATTGG 
41701 ACTCTAACAC AGTGTCACTT GGAGAGCAGT CAGATCAGCT TGTTCTCCTC ACATGTAAAT 
41761 ATACATATCT GTTACCCATG TTCTTTGTTC TGATAGATAA AATTGCCCTT TATGTGCATT 
41821 GAAAATGATT GAATACAGAT GGTCAGTTTC ACCTGGGTCA ACCTAGGAGG CATTGTTATA 
41881 AGAAGCGGAC TTGTAAGATA GGTAGCTTCA GTGATTATTG CTATGTTCTA TGAAAGAAAC 
41941 TTTTAACCTA AAGGATTCTT CTACTCTGAT AAGTGGCCTC ACTTGATATT TTGTCCTGGT 
42001 ATTCATATGA TAGCTGAGAT CTCTGAATTC TCTTTTTTTT TTTTTTTTTT TTTTTAAGAT 
42061 GGAGTCTCAC TCTGCTGCCT AGGCTGGAGT GCAGTGGCGC GATCTTGGCT CAGTGCAACT 
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42121 TCCGCTTCCC AGGTTCAAGC GATGCTCCTG CCTCAGCCTT CCAATTAGCT GGGACTACAG 

42181 GTGCGCATGA CTGTGACCAG CTAATTTTTG TATTTTTTTA GAGACGGGTT TCACCATGTT 

42241 GGTCAGGCTG GTCTCAAACT CCTGACCTTG TGACCACCCG CCTCGGCCTC CCAAAGTGCT 

42301 GGGATTACAG GGGTGAGCCA CCGTGCCCGG CCTTGACATT TCTGAATTTT TAACAGGTAT 

4 2361 AAATATACAA AAGATTATTG GTTAAATAAA AAGCAAGGGC CATAGACACT TCCCTTTGAG 

42421 CCATATGCAT GGAGAAAAGA AATTAAACCC ATGACTTGTG GCTGTCTCAT ACATCTCAAT 

42481 TATAAGGTAG AGACTCTAGG ATTGAGAAAG TCCCTTCCCA GAATTTGGAG AGGCACACAG 

42541 CCTCAGCCAC CTCTGAAACT CCAACCAGGG ATTCCGTGCC CTGCAACCTC CTCCACTCTG 

42601 CCACTAGAGT ATAGGGGCAG AAGTGTGTTT CCACCATACC TTGTTGGTCC AAAACACCTC 

42661 TCCCCAGCTC CAGCAACTGC TGCAGCTGTG CAGGGCAGTC CCTCTCCAGG TAGGCCCTGT 

42 721 TCTGCCTGGC CCGAATCTTG TGCCTTTCCC ACTCCAGCTT GGTGGGCCAG GCCCTGGGTT 

42 781 CTGCTGCTCT CCAATCCAGT GTGTCAGGGC AGAATTCAAG GTGGTCCTGC CCATCATACC 

42841 CGTACTTCCA GTAGCCCTCG GTACTGTTGT CTTCTTGCAT TTCACAGCCC AGGATGACCT 

42901 GCAGGGTGTG GGACTCTGGA AAAATCCCCA GCCTTGTTAA CTGCAACCAA AGGAATAGGT 

42 961 CCCTATTTCC ACCATCCCCA AGGACCAAAT GATCTCAGGA AGCAAATTCC TTCCCTCTTC 
43021 CCTGCTCCCA CAAGACCTCA GACTTCCAGC TGTTTCCTTC AAGATGCATG AAAAGATGAA 
43081 AAGCTCTGAC AACCTCAGGA AGGTGAGGCC CCCTCTCCAC ATACCCTTGC TGTGGTTGTG 
43141 ATTTTCCATA ATAGTCCAGA AGTCAACAGT GAACATGTGA TCCCACCCTT TCAGACTCTG 

43 201 ACTCAGCTGC AGCCACATCT GGCTTGAAAT TCTACTGGAA ACCCATGGAG TTCGGGGCTC 
43261 CACACGGCGA CTCTCATGAT CATAGAACAC GAACAGCTGG TCATCCACGT AGCCCAAAGC 
4 3 321 TTCAAACAAG GAAAGACCAA GGTCCTGCTC TGAGGCACCC ATGAAGAGGT AGTGCAGAGA 
43381 GTGTGAACCT GGAGACAGAG CAACAGGCCT TAACCATGTG TAGTAGGAGG GGAGCAGGAT 
43441 GTTGAGGCTC CACACACCTG CATCAACTCA TACCATCAGC TGTGTCTGGT CCTCATTTTG 
4 3501 TGAAGGGTGA GTTGCAGTCC TGTCTTTCTT CCATATGACA GTCCTGGGTG CTCTTTCCTT 
43561 GTGTGCTTTT CTCTGCCACA CGTGGCTGCC ACCCCCTCAC TGCCCCCAGA TCCTATTCCA 
4 3621 ATACTCATGA TTAGACAGAC TCCACTAAAG CTGGTGGATT CTAGAAAATG TTAAGGTGTG 
4 3681 TCTAGCCATG GTAGTTGAAC TCAGGAGTTG GTGCTCAGGG CAAATTAGAC CCAAATCCTG 

43 741 AGGAATAATT CCTTCAGTTT TTTTTTTTTT TrX' Tm -rrr TTTTTTGAGA CAGAGTCTCA 
4 3801 CTCTATCACC CAGGCTGGAG TGCAGTGGCA CAATCTCAGC TCACTGCAAC CTGCACCTCC 
4 3861 TGGGTTCAAG GGATTCTCCT ACCTAAGCCT CCTGAAAACC TGGGACTATA GGCGTGCGCC 
4 3921 ACCACACCAG GCTAATTTTT -GTATTTTTAG TAGACATGGG GTTTCACCAT GTTGGCCAAG 
43981 CTTGTCTCAA ACTCCTGACC TCAAATGATC TACCTGCCTC AGCCACCAAA GTGCTGGGAT 
44041 TACAGAAGTG AGCCACCGTG CCCAGCCTTG GTCCTGAATT CTTACACTGA ACTGCCTATG 
44101 TGGCCTCACC ACTTGGAAGC CTGACTGGAA TCTCAAACTT AACATGTCCA AATGCAGATC 
44161 CTTGATTTAC CCCAAACTGC TCTTTCCTCT GCCTTCACCA TCTCAGAAAT GGCATTGCCA 
44221 ATTACCCCAC TGCTCAGGCC AATAAAATTA AAATAAAGAA CAAAGTCAAC TTTAACTCTT 
44281 CTCTTTTTCA GGGGGTCAGG GGAGACAGGG TCTTGCTCTG TCACCTAGGC TGAAGTACAG 

44 341 TGGCACAGTC ATGGCTCACT GCAGCCTCAA CTTCCTGGGC TCAAGCAATA CCCTCCACCT 
44401 CAGCCTCCCG AGTAGCTAGG ATCACAGGTG CATGCCACCA CACCCAGCTA ATTTTTGTAT 
44461 TTTTTGTAGA GAAGGGGTTT TGCTGTGTTG CCCAGGCTGG TCTTGAACTC CTGAGCTCAG 
44521 GAATCTGCTC TCCTTGGCCT CCTCCTTGGC ATGAGCTACT ACACCCAGCC AATTCTTCTC 
44581 TTTCTCTCAC ACAACATAGA ATCCTTCAGC AACTTCCTTC AGAATATATT CAGGAGACAA 
44641 TGGTTTGTCA CTCCCTTTTC TGTTCCCACC CAGCCCACTC CACTACCTCT TGCCTGGACT 
44701 GTGTAACAGC TTCCTGGCTG GGCTCCCTGC TTTTACTGTT GCTCCCTTCA TTCTGCTTTC 
44 761 CACATAGCAG CCAGAGCAAT CTTTTAAAAG CCTGTGACAG ATCACTGTTA CTCCTTGGCT 
44821 AGAATTCACA CCACAGCCTA CAGGCGCCTG CACAACCTTG TTTGTGGCTC CTCTTCTGAG 
44881 CCCATTACCT ACTTCTTGGC CTCTACTCCC CAGCACTACT TGTTTATTTT TTTCAACCCG 
44 941 AGCTTCTTAA CCAGGAGTTT GTCTACTAGG TGACATGTGG CAAAGTTTAG AGACATTTTT 
45001 GGTTGTCAAG ACTGGGGGAG TGCTCCTAGC ACCTAGTGAG TAGGGAGGAC AGGATACTGC 
45061 TAGACATCCT ACATGCAGAT GGTAGTCCCC CTTCCCACCC CCACGCCGCC CCCCCCCCCC 
45121 ACACACACAC ACATGAGTAG TGCTGAGAAA ACCCGCTTTT TAATCCAACT TGCCAGGCCC 
4 5181 ACTCAGTTTG CCTGGGAAAT ACTGCTCCCA GTCAATATCA TTCTTATTTC CTTCATGTCT 
4 5241 CTGCTCAAGT GTCAGCCCCA GAGTGACTTG CCCTGACTTC TCTGCTTCTC ACAACACCCA 
4 5301 TGATTTCCTG ATGTTGTATA TCTTTCTGCT CATTTGCTTA TTGTCATCTC TCCCACTAGA 
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ATGCAAAATA 
TGCAACACAT 
AAATTTTATT 
AGACATGAGC 
CCGTGCCATG 
AAGTGCTGTG 
ATCACTGTCA 
TTATTTATGT 
GTAAAATAAA 
AGATATGAAT 
AGAGAGGCCT 
TTGTTCAGTT 
GCCAAGAGTG 
TCCACCCCAG 
CTCTTTCGCG 
TGCTGTGGGT 
TGACTGGGGA 
AGCCCCAGGC 
TTGGGGGAAG 
GGGGGCAGGT 
GGGGTCTGGA 
AAGTTAGCAA 
GCCCCTAGTT 
CTGCAAAAGC 
GGCCGCACGT 
CAGAAGTCAC 
AAAAGCAACA 
TGTGAAACAC 
GATTGGGGAA 
ATTTTCAAAA 
TTATTTGACT 
CATAACATTA 
TCTGACAAGT 
TTGAGGCCTG 
CATTAGCAGT 
TGGCCCACCT 
GATAAAGTCT 
CCTAAGGTGG 
CATGCCATCA 
TCTTAACAAA 
TTCAATCAAA 
AAACCCCATA 
AGCCCGCTGT 
TTTGGTGTTA 
GCTCCGGCTA 
CAGTATGGGA 
TTTGGGAATT 
GCTCCAAACC 
AAATTTTGGC 
ATACAATAAC 
TATTCCTTTT 
AAAATGCAAA 
CCGAGGAAGG 
ACCCCGTCTT 



TCAAAGGGTA 
GGCTGGGACT 
ATTCAACCTC 
TCTGCCACCA 
CCTCATTCTT 
CTGAGGCCGG 
ACTAAGATTA 
TCTTTGTAAC 
CGTATTATTA 
GTAACTTAGA 
CTTAATTACA 
CAAACGTTCA 
GGGAAAGGCC 
GTCTCACCAA 
CCCCCACCGC 
TTGCTCAGCC 
AAAAACTGCA 
TTAGCTCAGC 
GGAGTGGGCG 
CCTGGGGCGA 
CGCAGAAAGT 
ACTCCCAAGC- 
CGCCCGCAGC 
ATCAGGAGGA 
CCCCGTTAAA 
CCTACAGCTA 
GGTCTTTCAG 
TAGGTGATCC 
AGTAGCTTCG 
TTCAATCATA 
TAGAAATATA 
ACCAATTAGA 
GTTTCACAAA 
CTCCTAACCC 
TGGGAGGGGA 
AAATTCAGGC 
GTGGCCAAAA 
CACATGCCCA 
TTATAATAGA 
TTATAGGTAA 
TAACATCATC 
AAAGCACCTT 
CCCTCAGAGT 
GTTTGTAGTT 
TAATAATCTC 
TGCCACCTGG 
ATTGCCTTAG 
TTTACATATC 
AATTCAAGAA 
ATTGGAAACA 
TTTCAATTTT 
AATTGGCCCG 
CAGATCACCT 
TACTAAAAAT 



AAGACTTGTT 
CATTTACACT 
TAATGCAGTG 
AAGCCCAGTG 
GTCATGTGTA 
CGTGTGACCC 
GAAGCAGCTG 
CTGAAAAGAG 
GCTCCTACCT 
AGTGAGTGCA 
CAGCACATTG 
AAACTAACAT 
CGAGGTAGGC 
AAGTGGGTGG 
CCAACGCATT 
TTCTCGGCAA 
CAGCTGACAT 
TCAAGTGAGG 
GTTCCAAAAG 
GGGACCCCTA 
AGGGAGAGGG 
GCAAAGAAAA 
CCTCGGACTC 
GAAGCGCCGG 
TCTCCGCTTC 
TTGCCTAGGC 
AACTTTAGTT 
AGTGTCCCCC 
CAATGTTCTG 
CATTTAAAAA 
AAGCTTTTTC 
TCCTACTGAA 
CTTTACAGTA 
CAGACACACT 
TGACAGAAGA 
CCAAGACTAC 
TATCCTGGAG 
ACAACACAAA 
ATTTACATAC 
GACCATGCAC 
CTGTCACTCA 
GAGCTCTGTA 
GTATTATTGT 
CTTTGCTCAC 
CTCGGTTAAA 
GCAATGGGAT 
ACATTTCAAA 
TAGCAAATTC 
AATCAAACAG 
TGTAGAATAT 
TGGTAAGATA 
GCTCAGTGGC 
GAGATCAGGG 
ACAAAAATTA 



TCCCTGCTCT 
TGTAAACAAT 
TGATGTTTAA 
TACCATTGAA 
AAATGTGGAT 
ACAGAACACT 
TAGTACTTGA 
TTATATAATC 
CCCTATGCCT 
TTGCTTACAT 
CAAATCAATA 
ATACTTAATT 
CTCTCTCAGG 
AATGGTGAAG 
CGTTCTGAGG 
GCACTCAGGG 
TGGAAATAAA 
AACTACGAGA 
TCACTCCGCA 
TCTGCAGTTC 
GCTTGCGGAT 
AGCTAGTTTC 
ACGCAGCAAG 
CCTGGCTCGC 
TTTTGGGGGG 
TCAGGAGATG 
CTCTCTCTCC 
TTGGTTTTTA 
ATCTGAACTT 
TTTTATCTCA 
ATTTTGTTTT 
ACACCTTCCA 
TTGGGATTAT 
GATTTAATGQ 
GAGCGGAAAG 
CCTAATGCCA 
AAAGAGAAAG 
AGCCTGTCTT 
AGTTTTGCCC 
AGTTTAATTT 
GATACAGCCC 
AAGAAGTGCT 
GCTTCAATAA 
TATCACAAGA 
GGATCCATCC 
TTTAAAAGCT 
CAATATTAAT 
AACAGGCATT 
GATATCAGGG 
TGATGATGGG 
TAATTAGCAT 
TCACGCTTGT 
GTTCGAGACC 
GCCGGGCGTG 



CTCCCTTGGG 
GAATATTTCT 
GAATCATAGC 
TAAATTTGCC 
ACACGTAGTA 
GTGCTACACT 
AATAACATCA 
TGAATTCCAG 
AGTGAAAATC 
GTTCATTATC 
AAGCCTAGCC 
TTCCAGGCAA 
AGCCTCCCAC 
AATTCAGATC 
TGGAAACCCC 
AAGAACTTCC 
CCCGAGTTCC 
TTTATTTAAA 
GAGCCGGGAC 
AGTGGTAGGC 
AGGGTTGAGC 
GATTTTTCCA 
CGCCCCTGCA 
GGGCCCATTT 
CGGGGAAACG 
CCCAGTAAAA 
TACAGCAGAA 
AATCCTGAAG 
TAGATATTTA 
ACCTTAGACC 
TTGATTCAAA 
CAGCCTTCAT 
CTGGAGAATG 
GTAATTGTTA 
GCTGTCACTA 
CCCTAAGGGA 
GAGGGTACAG 
CAAGTTCACC 
CCCCATCCCT 
TAGATTGTAT 
AAACCTCAAC 
GAGTTCACTT 
ACTTTGCTTT 
ACTGAGATTG 
CAATGCATAA 
TTCCTTCTCC 
AAATTTAATA 
ATTTTTGTAA 
CCTCGACTGT 
CACATTGGGG 
ACCATATAAT 
AATCCCAGCA 
AGCCTGGCCA 
ATAGCAGGCA 



GCTTGAACAG 
GCTCAACATG 
TATGAAGTGG 
AGGAAGCAGG 
CCAAAACTCA 
ACAGGGCAAA 
GAAAACCAGA 
TTAACTTCTA 
AAATAAGATC 
AGTACTTTGT 
GAAAAGAGAA 
AAGAACAATT 
CCTAGAGACC 
CCCAACGCCA 
GTGCGGATCC 
TGTTTGGAGA 
AGGTTCAAGG 
AGCATTCTAG 
AGCCGGGGGA 
ACTCCCTCAC 
AGGTCCTCCA 
CCCCCGCCGC 
GGACCGCGGT 
CCCCAGCTCT 
GGGATGGCTC 
CTTCCTGGTG 
GGTACCTGCT 
GGGTGTTGTT 
AATATTTATG 
AACTTATGTC 
TTAATTAAGT 
AATTGAATTA 
ATTAAACATA 
GGTAGTTAGA 
AGACAGCCAC 
TGGAGTTTAT 
GTGGAAATTC 
CCAAGTTCAT 
GGGAGGCTTT 
AGCTATACAC 
TCCTCCCCAC 
CGCAGAAATA 
AAGCTTGCAT 
CTGGTTCAGA 
TTCCCAGTAA 
CTCAACGAAG 
CACCTGATTT 
GCATGTATGC 
AGGCAAACAG 
CTGATAGTAC 
TCATCTATGT 
CTTTGGGCGG 
ACATGGTGAA 
ACTGTAATCC 
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CAGCTACATT 
GAGCTAAGAT 
AAAAAAAAAT 
AGACAGGAGA 
TGCACTCCAG 
AAATGCAAAA 
TTACATTTCT 
CGCCTCCCAG 
CTACTTTCTT 
CGTGACTGGC 
TATCTATTAT 
TTGGTTTGTT 
GTTTTTTTGT 
GGGTCATATG 
GCATTTTATC 
GTTTTTGAAT 
CTTGTTGAGA 
ACCATAAATG 
TAATCCTATC 
TGGCCCAATC 
TCCCAAGCAG 
GTAGAGACGG 
CTGCCCACCT 
ATAATCCTAT 
GAATTCAAGA 
TCTCAGTCTG 
TTGTATATGT 
AAAATAGTGA 
AAACTTGTCA 
TATAGGTGCC 
TTTGGCCATT 
CAAATGGTCT 
ATCCAAGATT 
ATCCTGGCTC 
GCTGGTAAGC 
TATTTGTTGG 
GCATGAAGGG 
GAGGCTCTTG 
CAGGAATTCA 
AAATAAAATT 
GAAGATCACT 
CCAGTCTAGG 
GAAAGTGGTT 
GTGTTGTGGA 
CCACTTQTTA 
ACTCTGTCCT 
TGGTTAAGAA 
ACAGACACAG 
ACAAACCAAA 
AGGCTACAGA 
GAGAGAATAA 
CTAAGGAACT 
TCTGTACCCT 
AAAGGTGTTT 



AGAGGCTGAG 
CGTGCCATCA 
TAGCTGGGTG 
ATCGCTTGAA 
CCTGGGCAAC 
ATTAATGGAT 
ATCTCCCCAA 
ATTCCTCCAT 
TCTATTTGGA 
TTCTTTCATT 
AAGGACATAC 
TCTACTTTAT 
AGACTTATGT 
GTAACACTGT 
CTCCTATCAG 
CAGGGCCCCA 
AGACTCTTTT 
TGAGAGTTTA 
TTr riTTTTT 
CCGGCCACTG 
CTGGGATTAC 
GGTTTCACCA 
CAGCCTCCCA 
CTTTATGTCA 
AGTTTCTCAA 
CTGAATTTCC 
TTTAATATTT 
AAATCAGAAC 
TATAAACAAA 
AAAGGCTGCA 
AAGAGACTTA 
GAAGGTGAAT 
AAAAAGTTGA 
CACCTTCTGC 
TGGAAATGAC 
GGTTCAGATT 
AACTGGTATA 
CCTGTAATCC 
AGACCAGCCT 
AGTCAGGTGT 
TAAGCCTGGG 
TGACAGAATG 
GAAGATCTAC 
AAGAAATGGG 
ATCATCCTTT 
AACCCTCAGT 
GAGATTATAG 
AGGGATGATG 
CACAGGAAGC 
GGGATCTTGG 
ATTTCTTTTG 
TGATATACAT 
TCCTCCCAGT 
GAACTGGTAA 



GCAGGAGAAT 
CACTCCAGCA 
TGGTGGCATG 
CCTGGGAGGC 
AAGAGCGAAA 
TTTAGTATAT 
AAAGAAACCA 
TCTCCTCCTC 
ACATTTAGTA 
TAGCATAATG 
CACAACATAT 
GGCTATTGGG 
TTTGATTTCT 
TTAACCTTTT 
CAGTGTATGA 
GATAGAACAA 
TTCATTGAAG 
TTTCTGGAGT 
TTTGACAGAG 
GCTCCTCCTC 
AGGTACCTGC 
TGTTGGTCAG 
AAGTGCTGGG 
GGACTACACT 
CTTCAAATTT 
CTAGGAATTT 
TCATAAGAAA 
ACTGGGGGTC 
AAAGAAATGA 
GAGAAATGGT 
GAAGACTTAA 
AGATCATTTC 
CTGAACTGTT 
TGCAAGCAAA 
AAAAATTACT 
TTCATGTACA 
GGGCTGTGTT 
CAGCACTTTG 
GGGAAACATA 
GGTGGCACAC 
ACATTGAGGC 
AGACCCTGTC 
TTTTCTCTGT 
GTGAGAGCTA 
TCCACCCACT 
AGCTGTGAAC 
TGGAATAGGG 
GCCAGGTAGA 
TGCTAQAAGT 
CCCTGATAAT 
TTCTAAGCCA 
TTCTTTTACT 
GTCAACACAT 
TGAAAGAAAT 



CGCTTGAACC 
TGGGAGACAA 
CACCTGTAAT 
GGAGGTTGTG 
CTCCGTCTCA 
TTACAGAGAT 
TGTTCCCCTA 
CTCCCCTCCC 
TACATAGAGG 
TTTTTATGTA 
TTTATTTATT 
AATAGTGCTG 
TTTGGTTATA 
GAGGAATTGC 
GAGTTCTGAT 
AAATGTGGTT 
TGTTTTGGCA 
CTCAATTTTA 
CCTCACTCTA 
CCAGGTTCAA 
CACCATGCCT 
GCTGGTCTGG 
ATTACAGGCA 
GTCTTGATTA 
GATCTTTTTT 
TAGGATCTAT 
CTTTTTTCAT 
AGGCGCATTT 
CCAATCACAT 
GTCAGATATA 
GCCATAGATT 
ACCTTTAAGA 
AAGGAAGAAA 
CAGAAATGCT 
CCTGGGAAAG 
CTTGGGAAAG 
CATAAGGTCA 
GGAGGCCGAG 
GGGAGATGCT 
ACTTGTGGTC 
TGTAGTCAGC 
TCCAAAAAAA 
AAACCTAATA 
CGTAGATGCA 
TATGGGATGA 
CTGACCTTAT 
TGAGTCCTCC 
GATGGAGGCA 
GGAAACAGGC 
ACCTTGATCT 
CCCAGTTGAT 
GTCATAGAAG 
GGAATTCCTC 
CTCAGCATGA 



CGGGAGGCGT 
GAGCAAGACT 
TCCAGCTACT 
GTGAGCCGAG 
AAAATAAAAT 
GTGCAACCAT 
ATTCAGTACC 
AGCCCTAGAC 
CATATAATAT 
TGTTTTTCAT 
CATTCATCAG 
TTATAAACAT 
TATCTAGAAG 
CACATTCTTT 
TTCTCTCCAT 
ATTCAGTTGT 
CCCTTATCAA 
TCCCATTATG 
TTGCCCAGGT 
GCAATTCTCC 
GGTTAATTTT 
AACTCCTGAC 
TGAGCCACCA 
CTATAGCTTT 
TGGAAGACTA 
TATCAATGTC 
TTAAACTTTT 
AACAGGCAGA 
TGTGGAAGCC 
CCTGAAAATT 
GCTCAGTGAG 
GAGCAGGTAG 
CTCTAATCTT 
GAAATTCAAC 
TCAGATTTAG 
GGTTTAGCTT 
AGAGTTGAAG 
GCAGGAGGAT 
GTCTTCACAA 
CCAGCCACTC 
CATGATAGTG 
GAGCTGTATC 
AAGAATAGAG 
AAACAATACA 
ATTGCATCTC 
CTGGAATACG 
AACCAATGAC 
GAGATTGGAG 
AAGAAAGAAT 
CAACTGGCCT 
AGTACTTTGT 
TTTTGAATCT 
TCCTTGTGCC 
GGCCAGATGC 



AGGTTGCAGT 
TCATCTCAAA 
CGGGAAGCTG 
ATCATGCCAT 
AAATAAAATA 
TACCAAAATT 
CTTAATTCAT 
AATCTTTAAT 
ATTGCTTTGC 
GGACCAATAA 
CCGATGGACA 
TTATGTACAA 
TGGGTTTGCT 
TCCAAAGTAA 
CTTTGCCTGG 
TCCACCATCA 
AAATCAATCT 
CTATAATCTA 
TGGAGTGCAG 
TGCCTCAGCC 
TGTATTTTTA 
CTCAGGTGAT 
CACCCAGACT 
TTAGTAAATT 
TATTAGCTAT 
TATTCTATTT 
TTTTTTAAGA 
AGAAGAATAA 
ATGGAGTGGT 
GTCCATTGTA 
ACCCCGAGGG 
GAAGCTATAA 
GAGCCACCCT 
ACTCACAAAG 
AATTAGGCCA 
ATAGGCACAT 
GCCAGGCATG 
GGCTTGAGCC 
AACAATTAAA 
AGGAGGTTGG 
CTACTGCACA 
CACATCCCAG 
TGACAAATGT 
TCCCCACATA 
CCCAAAAGAT 
GTGAGTTCAC 
TGGGGTCCTC 
TTATGCTGCC 
CCTTCCCCAG 
ACGTAACTGT 
TACGGCAGCC 
TTTAAGTAGG 
TTGAAAAGTG 
TGTACCTCAC 
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51841 
51901 
51961 
52021 
52081 
52141 
52201 
52261 
52321 
52381 
52441 
52501 
52561 
52621 
52681 
52741 
52801 
52861 
52921 
52981 
53041 
53101 
53161 
53221 
53281 
53341 
53401 
53461 
53521 
53581 
53641 
53701 
53761 
53821 
53881 
53941 
54001 
54061 
54121 
54181 
54241 
54301 
54361 
54421 
54481 
54541 
54601 
54661 
54721 
54781 
54841 
54901 
54961 
55021 



ACCTGTAATC 
TAGACTACTC 
TAGCCGGGCA 
GAACCCGGGA 
GAGAGAGCAA 
ATAAAAATGT 
GTCTTTGCCA 
CAATTGTCAC 
CATTTATTTG 
GTGCATTTTG 
TATAAAACTC 
AGACACTGAG 
CTCCAAAAAC 
AAGGGGGTCT 
CTGCATGCAC 
CTGGAATCTA 
GTGCAACACC 
TTTCTTTGGA 
ACCCCCTTTG 
TCTTCTTGAA 
TGAGCTAAGG 
AGCAGTAAGC 
AGCACTCGGA 
GGCACATGTG 
TCTATGTGTA 
AGCAAGTAGT 
CAGGCCACAG 
AACCGTATGA 
GCCCAAGTAG 
CAATTTTCTA 
GAAGCATATA 
GGTTTAGTAG 
GTATGCCCAC 
TGGGTCCACA 
GAACTCCACT 
AGTCTTCCCA 
ATTGAGGCTT 
GGAACTGGGT 
GTTCCTCCAC 
TAATTGCAAA 
ATCATAAGAA 
CCATATTTAC 
TTTTCCAGTG 
TGGTACAGGA 
ACCAAGTTGC 
CATGACAAGC 
CTATTTCTAA 
TGGGCATTCC 
ACCAGTCCTC 
CATAACACAC 
ACAAACAAGT 
TAGCAACTTT 
GAGGAAGGTT 
TGATGAGTTT 



TCAGCACTTC 
TGGCCAACAT 
TGGTGCCTGT 
GGTGGAGGTT 
GACTTGGTCT 
TTCCCCTTCC 
ATGTTATTTT 
TATGTTCTGT 
TTTCTCGTGT 
TTGTTGTTAA 
ATGTTTAACA 
TTAAAGAAGG 
CGAGCTCCCT 
GTGTGAGAGG 
CAGTAATCAG 
TAGATAACAT 
AGGCTGTCTG 
GGCAGAAATT 
AGAATCTCAC 
AGACAGATTG 
TAGTGATGAA 
AGGTTTCTAT 
ACCATTTTTC 
CCACTTTTGT 
GACAGCAATT 
CGAGAGCCAA 
TAGTCAGGGC 
TTCAGTTGAG 
CAGGCCCTITA 
TAGCTATGCT 
CAGGGAAGCC 
TGTCAATAAC 
ATATCCAGTA 
CAGTTTGCAA 
AGGTGGCTGT 
CAGGAAGGGT 
TTAGGACCCA 
CTGTAGGTAC 
ATACATACAT 
AACAAATTTC 
GGTTTGAAAT 
TCAAGGATCC 
AGAATCAAGG 
AGGGCCACTT 
CTAAATGACA 
GTACTTATTT 
CTTATTACTA 
TTTTTCTTCT 
AGTCCTCAAT 
ATCAGGTTGG 
TATTTTTAGA 
TTGTCCTACC 
AGTTGAAGTC 
TCTCATGTTT 



32/162 

GGGAGGATGA 
GGTGAAACCC 
AGTCCCAGCT 
GCAGTGAACT 
TAAAAAAGAG 
CCCCAAACTT 
TATTATAACA 
AAAAATCACT 
CATACTGCAA 
AACAGCTTTT 
CTTATTTTTG 
AAGGGCTTTA 
GAGTGAGCAA 
GTCATGATCG 
AACAGAACAG 
AACCGGTTAG 
CCTGTGGATT 
GGGCATAAGA 
TCATTAGTGG 
ATAATGATTC 
GCTTTTTATC 
TAATATTATA 
AAACATGGCC 
CATATTTCTA 
AGTAAGGTTA 
TCCATTTTGA 
TCTGCTGGTC 
CATGTAAATG 
ATATTGTATG 
TTTTTTTTTT 
CAGGAGTTTG 
ACAACTACCT 
TAATCCAGTG 
CTTTGGGAAT 
TTTTATAGTA 
GAAGTCCTTC 
GAAGTTATCA 
TAATTCTCGT 
AACATGAAGT 
TTGTTTTTCC 
ACTGGCTCAG 
AGTCCAGCCC 
GGGTTGGTTA 
TTCCCTTTCT 
CAAGACCAGT 
TCTGCCATAT 
TTAATGACAG 
GTTTTGGCTA 
CTTATTTCAA 
TCATTTCTTQ 
GTCTTTGTAC 
TCAGTGACTT 
TTTACTGTGC 
CGGCCATGCA 



GGCGGGCAGA 
CATCTCTACT 
ACTCAGGAGG 
GAGATCACGC 
AAAAGAAAAA 
TAAAAAAGCA 
AAGGAATCTT 
TCCTAAAATG 
TGGATATCTG 
TTGGCCTGTC 
TAGCAGGACA 
TTCAGCTGGG 
TTCCTGTCCC 
ACTGAGCAAG 
GGATTTTCAC 
GTCGGGGGTC 
TCATTTCTGC 
CAATATGAGG 
GAGTTCTCAC 
ATATAGTACA 
ATTTGGAGAA 
ACTCCTATTA 
CCAGAAACAA 
ACTATGTCTT 
AATTTCCTAC 
TAGATAGCAT 
TTATTAGTAA 
GGGGTCCCAT 
ATTCTCTCAG 
TTTTTTTTTT 
CCTGTCTTTA 
GCCCACTGGT 
GGGGCTGTCC 
TTACTAAATA 
CTATTATACA 
CCCACTTTTG 
GGGTGAGTCT 
GCTTCCCATG 
GACATTGAGA 
TGGAATTTCT 
GGGAGCATTT 
CAACTATTTC 
TTACTAGTTC 
GAAGGTGGAC 
ATCTACATTT 
AGCCTCTTTC 
CACAGGCATC 
ACACTTTACT 
AAACTGTGGT 
GGCTACCTAC 
ACTTATAATA 
GATGTATACA 
AAGTCCAAAT 
TGGACCAGTC 



TCACTTGAGG 
AAAAACAAAA 
CTGAGGCAGG 
CACTGCACTC 
TGAAATTTCA 
GAAGTCTGCA 
GCAAGGCTAC 
TCTGAATTGA 
TCTTGTTAGT 
TTCTTCCACC 
AGCTACAGAC 
AGCTTTGGCA 
TTTTAAGGGC 
TGGGGGTATG 
AGTGTTTTTC 
AATCTTTAAC 
CTTTTAGCTT 
GGTGGTCGCC 
TTTTATTCTC 
CTTGTGCTGA 
GTACAGGTAG 
TAAGAGTTTT 
ATCCATACCA 
CAACTACTTG 
AGACCCCTCC 
TTTGCATCTG 
TTATTTCTAA 
ATCCCCACAA 
GGGGCCATTC 

TGGGCAGTAG 
CAGGTAATTT 
AGTCCCGGTG 
GATTTTTCTT 
GTTTTTGCCC 
CTATACAGTA 
TTTGAGCTGG 
GCCATTGATC 
GACTGGGCTA 
AGTACTGGCA 
ATAAACTTCT 
TAAGGTTACA 
TAAGGGGTTA 
AGGATTCTTT 
ATTTCCACGC 
CTAATGAACA 
AAATTTCAAG 
CGTATCGTTT 
CGTGGGAGGC 
CTTGTATAGA 
ACCATAAAAT 
CTGGGAACAG 
TTTAAGGAAA 
AGCTTCCGGG 



TCAGGAGTTC 
AATGTTATCC 
AGAATTGCTT 
TAGCCTTGGT 
GCATTATAGA 
TCATAAAATG 
CAGATCTCAG 
CTGCTTGTCT 
ATAAATATTT 
TATGAGGTAA 
AAAACCCCTC 
AGACTCACAT 
TTGCAACTCT 
TGACTGGCAG 
CACACAATGT 
CAGACCCAGG 
TTACTTTTTC 
TCACTTATTC 
ACTACCTATG 
AGCATTTTGG 
CAAACAAGGA 
AAATCTTCTT 
CACCTACATG 
CCCTTAATCA 
TTCAGTTGCT 
AGTTTCTTGC 
GACAGCTTGT 
GCCGTCTTGT 
ATTATTTTTC 
TTTTTTGCGG 
GAAGAAAGAT 
GGCATAAGCT 
GGACTCTGGG 
AGTGTGGTTT 
AAGGCAGCTG 
TTGTCTAATG 
GAATTTATCA 
TCCCATTACA 
CATGCTCAGC 
CATTCAGTTC 
CCTCAAACCA 
CGATCCCCTT 
CACTGACCAC 
TTATTTTTTA 
AGTCTTAATT 
GAACCACATC 
GTGACTTGTT 
ATGAACCCCC 
TCAGATGGGT 
ATAGCATTAT 
AATAAGACTG 
CCCTCAGTCT 
ATGAGTCCCT 
TGTGACTGGA 
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33/162 

550 81 GCAGGGCTTG TTGTCTTCTT CAGTCACTTT GCAGGCGTTG GCGAAGCTGC CACGTACAGC 

55141 TCACAGTCTA CTGATGTTCA AGGATGGTCT TGGAAGTTGG GCCCACTAGA ATTAACTGAG 

55201 TCCAATACCT CTACTCAGTC ACTTTCAACT GGGCTTTCTG ATACCAGGAG CAAGGTGGCA 

55261 GGTTTTAGGG TGTTGCAAAT TTCAATGGTT ATGCAGGGAT TTTCACATAG CAAACTTTGG 

5 5321 TACTTGGTTA ATCTAGCATT TGTTAGCCAA TGATGTATTT ATTAAAGTCA CCACAGCATG 

55381 GAGGGCCTTT AAGTTTAGGT TTTGTCCAAG AGTTAGCTTA TCTGCCTCTT GTGCTAGCAG 

55441 GGCTGTTGCT GCCAAGGCTC TTAAGCATGG AGGCCAACCC TTAGAAACTC CATCTAGTTG 

55501 TTTGGAGGCC CAGCCTCGGC CAGGGCCCCA CAGTCTGGGT CAAAACTCCA ACCGCCATTT 

55561 TTTCTCTTTC TGACACATAG AGTGTAAAGG GTTTTGTCAG GTCAGGTAGC CCCAGGGCTG 

55621 GGGCCGACAT GAGTTTTTCT TTTAACTCAT GAAAAACTCA TTGCTGTTGG TTGTAATAGA 

55681 TGTAGTTTAT CCAATCTACA TTTTTATTAA CTGTCACCCA CCAAAATATT GACTCAAATC 

55741 CTGCAGCTAT TTGATTTTGG GATTTAAATT GATCTGCTAT TCCCTGTGGG ACTCCAATTG 

55801 CATCTAAATA GATGTGAGAG TTGAAAQACA CATAAGGGTC TTCTCTTGCT TTACGATGTC 

55861 TTATTTTTCC TCCCTCTGGT TGATGAAATG CTAGGGTGAA AGGGATAGCC AATTGGACTA 

55921 AAGTACAAGT GCCGCTCCAG TTATTTGGCA GAGTGCCCAG TAAAGGTCCA CCACAATACC 

55981 ACCACACATC CGCTTGGGGA TGAACAAAGG CTGACTGATT GAGAAGCTCC TGAAAATTCT 

56041 TAAGCTCACT GCATCCCTTC AGGTCTCCAA GGAATGCTAA GTTTCCTCCC TGTCATGAGA 

56101 GACAAGAAGT GAACTTAGTT TTGGGAGATG GAAGCTGGAT GGCCCTCAGG GGTTGACCTG 

56161 CAGGGTGCTG GACTTTGGGA TATAGCAGAG AGAGCTTGGC ACGACTTATT ACTCCAGGCT 

56221 GTAGAATCCT GGAAAACAGT TACCATGCAG CCCATGCCTG GTCAACAGGA GGACCACCTT 

56281 AGTGGAAAGG GGATAATCTG GCCCTCTGGC CTGCCATGTG CACAAGCATA ACAATTGGTT 

56341 TTGTTTAATG TGTGGACAGA ATATTTGATC' CATTCCAACT GGGCATTTGC ATCTTGGTAT 

56401 CCTGCTTAAT TATCAAAGTT TGTTTTAAGT CTTTAACTTC TATGACCCTC TAGTAAAATG 

564 61 AATGTATGAT TTTAGGAAAT TACAAAAACC GGTTGGGGCA GTCCATCCTT GCTCTTTAGT 

56521 GGTCCACACA ACATTCGACC AACTATGGCA TAAAAGCTCT ACATCGGGGG GCAAGACTCC 

56581 TCGTTGACAC TGGGGTCTTT ATTGAAATCT CTCTGGAATA AATGGTCTCA GTTTACTAAG 

56641 GCTCAGTCTG AGGAGAGTCA GGAGGGACAG AGGTACTTTT CTGAAGTACA GAGATGTCTT 

56701 CGACTTGGCA AGTCCCCACA GGGTATAACA AGGCAAGCAT TAAATTCAAT AGTTTGAGGC 
56761 • AAAATTGACT TGGTTATGTT AATAACTAGA TGGTCAGAAA TAGAGTGAGG GAAGAAGAAA 

56821 GAGTAATAGA ATAGATGAAG GAGTTAAATT TTTCTTAGCT TTAGTTTGGT AGGGTTTTCC 

56881 CCTGGGACTA TGGCCCATGA CTCTGGAGGG GGTGGCACTT TCTTGACTCG GGTGTGATGA 

56941 GTCCATCCCT TTTTCACCGT ATGAACAACA GTCTCGGTGG TTAGCAGCAC AAGGTAGGGT 

57001 CCTTCCTAGG CTGGCTCAAG TTTTCCTTCT TTCCACCCTT TGATGAGAAC ATGATCTTCA 

57061 GGCTGGTGCT GGTTTACAGA AAATTCTAGG GGTGGTACAT GTGCTAAAAG ACTTTTAGTT 

57121 TTGAGGGAAA GGAAAGTGGA AGATAAACCA AGTATATAAC TTTTAAGAAG TTGACCTTTT 

57181 GTTTTAAATG TGGGGACATC AGCAGTGGAC TTTATAGTCC TTGGTGCCTT CTTACTGAGA 

57241 AATTTCCTTT AGCACCTATT TTTATTAGTT TTTAGACCAA AGAAAGTCAA ATGCCATTTT 

57301 ATATTTGACA ACGCTTCTTG TATGTTTATA CCAGATAAGC TAGATTTCAC CTTTATATTG 

57361 GTGTGTTATT AATGTTAAAC TTAGTTTTAA TAAAACTCTG TAGACATATT TATTTGATTT 

57421 TTAATGTCTG ACCATAAGGT AAGATTTTTA TAGACTTTTC TTTAACCTTT TATAATTTTT 

57481 GTTAAAGAAC AGGTTAGTGC TTTAAGAAAA ACCCGTTGTG TTTTTATTTT AATGTTCAGT 

57541 TCACAGAAAA ACTGTATGAT ACCCCTTAAC TTTAGCCAAT ATGTTTAGAC ACAGAATTTT 

57601 CTTTACAATT AAGGTTTCAA AACTTGCTTA AACCTTCAAA ACAATTTTTG TAACCTTTTA 

57661 ATGTAGGTAA AAATCCACAT TCTTATGCAT CCTCATAATC CTTTTACCAA AGGTATATTT 

57721 TACTTTCCTT ACATACCTTG CACATAAACT GTTTATTCAA TAGTTTTACA TTTAGAAGGA 

57781 GGCCTAATTA CTTTTAAATT ATACAACATT TCTTACATAA ATTTATTTTT CTAACACACA 

57841 TTTTTTTCAT GACTTTCACA GACAATTCTT CGACATGCCT CAACTTTCTG ACTTATTGCA 

57901 AACATCCCTT TCTTTAAACA ACTAGTTAAT TTATCTCAGG ACAAGGATTT TCCATACAAC 

57 961 ATTCTTTTTT ATATAAATTC TGCCTCCTCT TTATTTCCTT tTTTTTTTTT CCGAGGATGA 

58021 TAACCATTCT TTTCCAAAGC GAACTTCTTT TATGTCTGTG GACTAGACTG TCTAAGGCCA 

58081 CAAGATTAGA AGTTACTATA ATACATGTTA CACTGTTAAC TTTTAGCAAA CTTTACTTTT 

58141 GTTGAAAACC TTGTAAGTTT GGGATTTCAA TTATCCTTTG CTATTAATAA GACCTTATTT 

58201 AGTCCAAATT AACTTAGAAT TGGTATAGAT GGCTTTTTTT TTTTTTTAAT TACCTGGGAG 

58261 GAACCATCTA TCCTCCTGTC CTGAAGGGAG TTCCTCCTAG GTCTGGTCAG AGCTTTGTAT 
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34/162 

58321 GGTAATTAAG ATTTAGATCC CCTGTTAGGA AACCTGCCGG GTTAAGAGAA TTTTCAGTGG 

58381 TTAATGTTAA ATCATCTTCT TTTTTCTTTT TTCCTTAGGA TACTTCTGAA CCGGTGAGGT 

58441 GTGCTCACAA TGAGGTTTCC TGTAAAAGTT ATTTTTTTAC TTTCTTCTGT TAGCAAAGCA 

58501 GTTGCCGCTA CAGATTGAAT GCATTTGGGC CATCCGCGGG TTACTGGGTT AAGGATTTTT 

58561 GATAGGAAGG CCTTAATGCT TTTGGAATAT GCCCTGACAA CAAAGTGCCA GTTCCTTCCC 

58621 GGTGTTCAGC CACTGCGTTG ATCCTCCACG AGGGCCTGCC ACGTGCTGCT CTGGTGAGGC 

58681 GTTCCACCGG GGCAATTGCC TACCTGGGAG CGCTCTCCAG ATCTGTGTCG CTCAAACTGG 

58 741 CTGGAGTTCC CCGTAGGGAT GCTCCACAGG GCAGGCCTAA GTCGCCTAAG GGGCTGCCTT 

58 801 GACCGTCCGT TAATCACCTC TGTCTCCAAA AACCAGCTCC CTGAGTGAGC AATTCCTGTC 

58861 CCTTTTAAGG GCTTACAACT CTAAGGGGGT CTGCATGAGA GGGTCGTGAT TGATTGAGCA 

58921 AGCAGCGGGT ACGTGACTGG GGCTGCATGC ATCAGTAATC AGAACAGAAC AGAACAGCAC 

58981 AGGGATTTTC ACAATGCTTT TCCATACAAT GTCTGGAATC TATAGATAAC ATAACCTGTT 

59041 AGGTCAAAGG TCGATCTTTA ACCAGACCCA GGGTGCGGTG CCGGGCTGTT TGCCTGTGGA 

59101 TTTCATTTCT CCCTTTTAAT TTTTACTTTT TCTTTCTTTG GAGGCAGAAA TTGGGCATAA 

59161 GACAATATGA GGGGTGGTCT CCTCCCTTAA TTTAAACAAA ATTTTCAAAG TCCTACCCCA 

59221 AGTAAATTGG CAAATATTAA TAAAGTTATG GCATAGAAAA TAAAAATGAT TGTAAAAGGC 

59281 GTAAAGATAT TTCTGTGGGG AAAACATTTG TTCATTAGTT ATCAGTTAAA ATTCTGTGAA 

59341 AAATAACCAC TAGAGACCCT AAAGTACCCA GGGGCTAATA ATAAGAAGGG AGGAACACCC 

59401 TCrCACTCCC CACCGTTACC TGCCCAGAAG GGAAGAGGAA GAGGGTGACT CCAGGAGAGC 

59461 TGTGGTCTCC CCTCCCCATA TGTCCACATA TACCTGACCT CCCCTCCCCA AAATATATAC 

59521 CCAATATCTC TCCCATATAT ACATATTTAT CTGACCTCTC CACATATGTA TACCTAAACT 

59581 TTCTCTATAT ATCCACATAT ACCTAACCCT CTCACACACA TATAGCTGAC CTCCAGTGGA 

59641 GGAAAATGGG GAAGAGAGAA GAAGTTATCA AAGGATAAAT CTAGGTCATA CTCAGAAATG 

597 01 TGAAAAACAA AAACCACACA CAGAAAAAAA AAACACACAC AAAAAAGAAA TTGATAAATT 

59761 TGTTTGTGTC AAAATTAAGA ATTCCGGTTC AATGAAGGAT CCCATGGATA AAGTTAAGAC 

59821 ACTGCTGTAA GGATGGTAGA GAATTAAATG TCTGAATCAG ACGAAAGGAT GAGTAATTAG 

59881 AATGCACAAG GCCAAGAAGA ACAAAACAGA AACTCCACAT AAAAAATGTA TGAGGCCGGG 

59941 CGCGGTGGCT CATGCCAGTA ATCCCAGCGC TTTGGGAGGC CAGGGCGGGC CGATCAGGAG 

60001 TTTGAGACCA GGCTGGCCAA CATTGTGAAA CCCCATCTCT ACAAAAAATA CAAAAAATTA 

60061 GCCGGGCGTG GTGGTGGGTG CCTATAATCC CAGCTACTTG GGAGGCTGAG GCAGGAGAAT 

60121 CACTTAAACT CAGGAGGCAG AGGTTGCAGT GAGCTGAGAT CACACCATTG CACTCCAGCC 

60181 TGGGTGACAG TGTGAGACTC TGTCTCAAAA AAAAAAAAAA TTATATATAT ATATATATAT 

60241 ATATATATAT ATATATATAT ATATGAAATA AATGAACTU^G AAATTTAGAT ACAGGAAAAT 

60301 CCAAAGCACT TGGTAATGAA AGAAAGGTAA AGTGATGTGT CCTTTTGCAT TTAAAAGAGA 

60361 GCATTAACAA ATTAGAGAGC TGAATAATGC TCAGTATTGG TGTGGATATG GAGACTCAGG 

60421 AATCCTCATA CACTGCTGAT GGGAGTGCCC ACTCCCTGGG AATATTTTCC AAATATCATC 

60481 TCAAACATAT CCCATAAAGG TGACAGGAAA GTGTGGGCTG ACTGATATCC TTCACTGAGA 

60541 GAGGTGGAGG TAAAATGAAG TCACTGCACA ATATAGAGTT GGAAGCAATG GATTAGATGT 

60601 CCACATAGTT ACGTGGAAGA ATCCGTAAGA TACACACACA CACACACACA CACACACACC 

60661 TTTGTGTATA TTGTTCCTGG CAGGTAGGCA TGGAGGTTTA GAGGCTTTCT ACATCACACC 

60721 TACTGCACAC AGTAAATGGC CAGGCTGAGC ACTGACTTCC ATGAAGGGAG ATTGAAGGTA 

60781 AGAGATTGAA GATTGTTCCC TGGTCT6GGA CCCTGCAACT GAATATGCAG AAAAAAGTAC 

60841 ACCCCGCCAC CCCGCTTCCC ATCTTTCCTA CCTGATTAGA ATAGCTTTTT CAGAAAACGT 

60901 TGGCCAGGGG TTGTGGCTCA CACCTGTAAT CCCAGCACTT TGGGAGGCTG AGGCGGGCAG 

60961 ATCATCTGAG GTCAGAAGTT CCAGACCAGC CTGGCCAACA TGGCGAAACC CCATCTCTAC 

61021 TAAAAATATA AAAAATTAGC AGGGCATGGT GGCACACACC TGTCATCCCA GCTACTCGGG 

6 10 81 AGCCTGAGGC AGGAGACTCA CTTGAAGCAC AGTGATGGAG GTTGAAGTTA GCTGAGATCT 

61141 TGCCACTGCA CTCCAGCCTG GGCAACAGAG TGACACTTTG TCTCAACAAC AACAACAAAA 

61201 CCCACCAAAA CTTTAAATCT ACCTATGGCC AAATGCCTGC TAAAATGAGC ACCCAAGAAG 

61261 CAGTGTTCAG GAAAGTCAGA TGAATACCCT AAAATTAGAT GCAATGTTGG CTGGTCACAG 

61321 TGGCTCAGGC CCTGTAATCC CAATCCTTCT TGGGAGGCCG AGGCGACAGA TCGCTTAAGC 

61381 TCAGGAGATC GAGACCAGTC TGGACAACAT GGTGAGACCG TGTCTCTACA AAAACGTACA 

61441 AAAATQAGCT GGGAGTGGTG GCGCGCACCT GTAGTCCCAG CTACTCAGGA AGCTGAGGTG 

61501 GGAGGATCTC TTGAACCCAG AAGGCGGAGA CTGCAGTGAG CAGAGATCAT GCCACTACAC 
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61561 CCCAGCCTGG ATGATAGAGC CAGACCCCCA TCTCCAGAAA AAAAAAATAA AGAGAGAGAG 

61621 AGATGCAATA TTTAGGGTTC AACAAGACTG AATTTCTGAC TCCTTTCCCT ACCTCTCCAG 

61681 CATGTTAGAT TCTGGGTCCT TCATCCTAAC CCCCTGTTCA TGCCATAGCC ACCCTGTGGT 

61741 ACCAACTTTG GAAGCCTGGA TCTTCATCCC CTCATGATAA TGAGTGTCCC ATCAGGTCTC 

61801 CATGCTCAGC TTGGCAAGAG TATCTGTCTT CTCCTCATGG GACGGTCACA TTCACCCAGC 

618 61 ACTGACAGGT TCCATTCCCA CTAGGGTGGC ACCCTATATG GTCTGAGTCC AGGCCTTCCT 

61921 GGTCCCTCAG TAATCTCAGC ATGGTAGCAC AATCGAAAAG GGCTAGGCAC GGCAGCACCA 

61981 TTTCCCACCA AGAGGTCTGA TGGCTCATCA CATAGACTGA AGGAGATTCT GAAGAGCAGA 

62041 GGTGGAATGA AGAATGAATC GTGGGCTCTG CTCTTCCTAG GCCTGTCTTC CTCTCTCCCG 

62101 AGATGTTAGC TAACTCATGA GAGCCAGAAA CCAACTGCAG GCTGGCCTCA GGCACTTAGG 

62161 TAGTGCTTCA GCCTCAGCAG TCCACATTCT AGGAACCCTC ATAATATGGG TTGAAGTATG 

62221 CATTCCCACA AAAATAAAGT TGTTGAAGTC CTAACCACCA GTACTGAAAT GGGAAAAGTT 

62281 CCCTTGTCCC GCTCGCATGG CATGTGATAG GAGTGTGGCT AATTTCTTCA GTGCCTGGCT 

62341 GCTCAAACCT CTAGGGGAAC ATTAAGACGG GCAGGTTGTG GGTCTCCAAC CCCATGACCC 

62401 CACCACAGTG TCTAGGGTTG AATGTTTACA GCTCCTGAAG CCACAGTGGG TGTGTGTTAC 

62461 AGGGTGCTCT TTTAGTTTTG CCATTTATAG GCAGCTGGTG TTAACCAACT CAATTAGACC 

62521 GTCTACCTTG TCCCAAGGAC AGAAGAAGGC TTTCTGTATC CCAGGTTCTT GCCTTGGTGT 

62 581 ACCGGAATAA ATCAGACCAC ACCTGGGCTT AGAGAAAGAG TGCAAGGTTT TATTAAGTGG 

62 641 AGGTAGCTCT CAGCAGTTGG GCAAAGCCAA AAGTGGATGG AGTGGGAAAG TTTTCCCTTG 
62701 GAGTCAGCCA CTCAGTGGCC CAGGCTCTCC TCCAACCACC CCAGTCAAAT TCCGCCTCAT 
62761 TTTGCCAGGC AAACGTTTGT TGTGTGCTCT TCTGCCAGTG TGCTCCCCTG GACGTCCAGC 
62821 TATTCGTGTC TTGTGGCAGG CCAGGGGAGG TCTTGGGAAA TGCAACATTT GGGCAGGAAA 
62881 ACAAAAATGC CTGTCCTCAC CGTGGTCCCT GGGCACAGGC CTGGGGGTGG AGCCCTAGCC 
62941 GGGGACCACG CCCTTCCCTT CCCCACTTCC ATATCATTTA AAGGGACCAT GCCCTTCCCT 
63001 TCCCAGCACT TTCCCCCTCC TGTATCAGGA CCTGTGAATG TGGCCTTATT TGGAAATAGG 
63061 GTCTTTGCAC TTCATCAGTT AAGATAAGAG TGGGCTCTAA CCCAACATAA AGGGTGTCCT 
63121 TATAAAAAGG AGAAATGTCA TACACAGAGA CTGACACCTA TAGAGAGAAA ATGTGGTGAG 
63181 TAQACACAGG GAGAATCACC ATTCAAGTCA AGCAATGAGT CTGGGGATAC CAGAAGCTGG 
63241 GAGAGAAACC TGGAACAGAT TATCCCTCAT TGCCTTCAGA AGGAATCAAA CCTGATGATA 
63301 CTTTGATTTC AGACTTCCAG CTTCCAGGAC TGTGTGACGA TAAATATCTG TTGTTAAGCC 
63361 AACGAGTTTG AGGTACTTTG TTACTGCAGC CCCAGAAAAC TAATACAGTA GGTACTATGG 
63421 ACTGAATTGA CTCCCCGTCG CAAAATTCAT ATGTTGAAAC CCTAACCCCC AGTGTGATGG 
63481 TACTTGGAGC TGGGGCGTTT GGGAAGTCAT TATATTTAGA CAAACTCATC AGGATGTGTC 
63541 TCTCATGATG AAATTCATGC CCTTATTAAA AGAGACAACA GGCCAGGTGC AGTGGCTCAT 
63601 GCCTGTAATC CCAGCACTTT GGGAGGCTGA GGTGGATGGA TCACCTGAGG TTGGGAGTTT 
63661 GAGACCAGCC TGGCCAACAT GGTAAAACCC CATGTCTACT AAAAATACAA AAATTGGCCA 
63721 GGTGTGGTGG TGCACGCTTG TACTCCCAGC TACCTGGGAG GCTGAGGCAG GAGAATCCCT 
6 3781 TGAAACCAGG AGGTGGAAGT TGCAGTGAGA TCACACCACT GTACTCTAGC CTGGGTGATA 
63841 GAGACTCCAT CTCAAAAAAA AAAAAAAAAA AGACAATAGA GCCAGGTGCT GCAGCTGATG 
63901 CCTGTAATTC CAACACTATG AGAGGCTGAA GCAGGAGGCT CGCTTTAGCC CAGGAGTTCA 

63 961 AGACCAGCTT GGACAAAATA GTGAGACCCC CAACTTCTAA AAATTTAAAA AATGAACTGG 

64 021 GTGTGGTGGT ACACATCTGA GGCTCCAGCT ACTCTGGAGG CTGAGGT6GG AGGATTGCTT 
64081 GAGCCCAGGA GGAGGCTGCA GTGAGCCATT GCTGTCCAGC CTGGGCTACA CGAGAACCTG 
64141 TCTCGGGAAA AGGAGAAAAC AGTGAGACCT CTTTTTCTCT CCTCCTTCTC TCCACTGCCT 
64201 AAGCCCTACA AGCACAAAAA GGACACCACA TGAGCACATA GTGAGAATGC TGCTGCCACC 
64261 AACAAGTCAG GAAGAGAGCG TTCACCTAGA AACTGAATTG GCCAGCACCT GGATCTTGGA 
64321 CTTCTGAGCT TCCAGAACTG TGAGAAAGTT ATTTTTTTTT TAGCGACTAA GTCTATAGTA 
64 381 TTTTATTACA GCAGCTCAAG GTAACTAACA TAGTAGAAGG GATGAATTAT GGAGATCACA 
64441 AGTCCACGCC TCCAGAAAAA GACTTCCCTA AAAATTAGTC TGAGCAAAAT TCGAATGATG 
64501 AATTATTTTT AAGAACTTTT AAGGGATCTG ACAAGTTTGC AAGAGCTAGA GAATGCTTTA 
64561 CAACGTGATA ATAGAATGCT CTGTGATGAC AGAAATCTTT CCACACTGTT CAAAACTAGC 
64621 TACTGGCCAC TTGTGACTAT TGTGCACTTG AAATGTGACT GGTGTCTGAG GAGCAGAATG 
64 681 TTTAATTTTA CTTAATTTTA ATTCATTACA ATAGCTACAT GTAGCTAGGG GCTACTGGAT 
64741 TGAACAGCAC AGCTCGAGTC TTTTAGAGGG AGACAGGACT CACCAAGATG GATGCTGGTG 
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64801 GCCAAGCAGC AATGGCAGGT AGTACACACA CAAGAGGCAG ATGATACAAC ACATCCTTCC 

64861 CAAACCTGGA GATAAGCTCA CCCCACAATC CCGCCGCTGA AATAGAGTTG • ATGTTACCAA 

64 921 TGTGCATTTT TATGTCCTTT TCCATACAGA AAGATCATTC AGCAAGTACT ATGGTACTTA 

64981 AAAAACAACA TTCAATTCAT TATTATGACA AAATTAAATT AATAGCTCTT CCTTAAACTT 

65041 TTAAATTCAA TTTACAATGC TTACTATTGG CATTTATTAA TCTACCAATT TTTTCCCATA 

65101 GAACCCATAG AACAAATAAT CTACCAAATT TTTAACATTC ATTTTTGGCA AGGCTTTTGC 

65161 AATTTGACGA ACTTTAAGAA GAAAACTTAT AAATTGCAAT TTTTAAATCT GACATACTGG 

65221 ACTTTTAAAG TATCCAATTG ACTAATGAAC AAAACTGCTC CAAATTTTTC AATTCTTAAA 

65281 AATCTTAAGA CAATACTTAA TATGGCAAAT CTTAACTTCT TAAACTTTGT AAGAATGCTA 

65341 ATCAACTTAG ATTGGTATAA AGTTGAGTTA AAAATCACAG GATACATCAT CTCAGCTATA 

65401 AGTTTTCATG AGTTGAGTTT TTACAATCAC TTGAAATGCT TAGAATAGGA AATACGTATA 

65461 AATTATTTAA CATAAAATAT TGTTACAAAA CCTCTGGAGT GTCAGTTTCT CTGGCCAGAC 

65521 TTTATGCTGC AGCACCTTTG CCTGAGTTCT TGTCCTGCAT CCAGGAAGAA TTAGGTACAG 

6 5581 AGGCAAGAGT CAAGAAGATT AGTTTTCCAA TAGTTCAGCT CACCTAGTTA ACTCCTGTTC 

65641 ACAATCTTCA AAGTTATCAG AAACCTGCAA TTGAGGGTTA TAATCCATTC TTTGCAGAGT 

65701 TTCAAAACAA GACAACATTT GTCTATGAAT GTTAAAATGT CCTAGGGTAG TCACAGTCAA 

65761 AAACACAATT GACAAAGAAA TTTAGTCACC TCTGTGATTT ACAATAGCCT AACACAATAA 

6 5821 CTCTAATTAT AACTGATGAC ACAAACTCAG ATATCAGAAC TCTAGAAATC CCCTATAATT 

65881 TTGGAACACA CATTCACAGT TTTCACTGAA ATATGACCTG AAGATCAAAT ATCACCTTAT 

65941 TTCAACAATC CTATATAACT AAACGTGTCA AATGATCCTG TTTACCTCTC CTTTGGATAC 

66001 TCCAGGGGCC CTCTGTAGCA TCCAAAAGTT AGGGGTTAGC AAAGACAATT TTGAAGCTGT 

66061 AAAGGCTCAA AACACTTAAT GAACCTCTAG TCATATCTGT TCTCTACTCA CTAAATGCTA 

66121 GTAGCACCTC TCAGTTGTGG CTAAGCTGGG AGGATCTCTT GAGCCTAGAA GTTTGGGGAC 

66181 GCAGTGAGCT ATGATTATGC CACTGCACTC CAGCCTGGGC AACAATGCAA AATCCTGTCT 

66241 CAAAAACAAA AACAAAAAAC AAATTGCCTA TGCTGTGGTT ATCTCACAAT TAATAAAAAG 

66301 GAAAAAAAAA GTATGCAGTC TTTGTAGGTC CTTGGGGTTT GTTGGAACTC AGAAAACAAT 

66361 ACCCCAAAAT AAAGACCGCA GAAGCCAAAG TTTTTCTCTG ATCTTCTCCT GCCCTCCTGT 

66421 CTCTGAGTCC CATTCTCCCC GGAGTCTAGC CATAGAAATG AGAATTCCTC TTCCTCAAGT 

66481 TAGGTCATAG' AAATCAAAAC ACCTTTTCCC CAGAGCCCAG CCATAAAACC TAAAAATATT 

66 541 ACTCTAACTT TCCCTCTGTT TTTCTGTGTA AAAACTGGCC ATAAAGAAAT TATCTGAACT 

66601 ACCTTATTTG ATCATAGATC ACCAGACCGC ATTCCAGAGA GGATCCAGAA GGAAGGAATG 

66661 CTGCACAGAG AGGCGAAGAA GAATCTAGAC AGACAGGCCT TGCTGGGTTT CCCTACTCTG 

66721 TTTATTAGCA ATCCTATTTC TACACGGCGG CCCATACTTT GTTGAATCTA AAAAATAAAA 

66781 ATGGACAATT TCCCCTGTAC ATGTTAATAC ACATTAATAA ATTGGATATA AATTGGATAA 

66 841 TTTATTAATA TACACATTAA TAAATTGGAT GCAGCCGGGT GCAATGGCTC ACGCCTGTAA 
66901 TCCCAGCACT TTGGGAGCTG AGGCGGGCAG ACCACGAGGT CAAGACCACC CTAGCCGAAA 
66961 TGGTGAAACC CCGTCTCTAT TAAAAATACA AAAGTTAGCT GGGCGTGGTG GCACATGCCT 

67 021 GTAGTCCCAG CTACTGGGGA GGCTGAGGCA GGAGAATTGC TTGAACTCGG GAGGCGGAGG 
67081 TTGCAGTGAG CCGAGATTGC GCCACTGCAC TCCAGCCTGG TGACAGAGTG AGACTCCGTC 
67141 TAAAAATAAT AATAATAATA ATAATAATAA TAATAATAAT AATAAATTGG ATGCATTTTA 
67201 TCCTATTAAT CTTCCTCTTG TCGGTGGTTT TCAGCGACTC TTCAGAGGCC AAAGAGTAAG 
67261 TTTTCCCTTA GCCCCTACAG GTTCTTATGT TTAATTTGTT ACTCTCATTT AAGACATAAT 
6 7321 TAAAGTGGCT TCTCCATGAA GATTATTTCT GCATCCATTA TTTGGTAAGA TTGGCCGTTT 
67381 TCTCCTTTGA TCTCTACTTC ACACTGACCC ACATAAAACA TCACTGCCTG TTTTTTTGTT 
67441 GTTGTTGTTT GGAGACGGAG TCTTGCTCTG TTGCCCAGGC TGGAGTGCAG TGGTGTGATC 
67501 TCCGCTCACT GCAAGCTCCG CCTCCCGGAT TCACGCCATT CTCCTGCCTC AGCCTCCTGA 
67561 GCAGCTGGGA CTACAGGCAC CCACCACCAA GCCCGGCTAA TTTTTGTATT TTTAGTAGAT 
67621 ACGGGGTTTC ACTTTGTTAA CCAGGATGGT CTCGATCTCC TGACCTCGTG ATCGGCCCGC 
67681 CTCAGCCTCC CAAAGTGCTG GGATTACAGG AGTGAGCCAC TGCGCCCGGC CCCGTTTTTT 
67741 TTTTTGGTTT TTGCATGTCT TCTCCCTTTT ACTGTAAACT ATTTCCACTA CCAGCGTAGT 
678 01 TATCATTTCT ACTGCTTAAT AATTGTTTTG GGGAAGTGAA TGCATCAACC CACATGAATT 
67861 TCTTGTCTAT TTGACAATTT ATTCTCTTTA GGAATAGTAT TAACTCCTAA GGTCCTGGGA 
67921 GCCAGTCTCT GTACTTGGCT GCTCCAGGGT CCTACTTCAG TTTCCCAGCT TCTCAGTACT 
67981 GTCACTGTCA ATTGTGGGTA ATAATTATTT TTGTCCACCA AAAGACTCTG TATGTGAATG 
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71281 CTTTCTGCTC TTCCACACTA CCAGCTCAGC TGTGCTCTCT ACATGCAGGC AGTTTTACAA 

71341 GTTTCAGATT AGCCTGGGAC TTCCAGGGTT TTGAATGGGT TAGGGAATGG GGAACTTTTG 

714 01 GGTTTACTTT CCATTTTTTC TTCATACATA TGTAATATAT AACATAAATC TATGGTATAT 

71461 ATGATAAATA TATGGCTACA TATGAACTAT ATAATCACAT ATATGCATTA TAAATAAATA 

71521 TTAATTTTAT AATATTTTAA AGGTTATCAA ATAAATATTA ATATAAATAA TTAAATAATT 

71581 AATACTCAGC TTTGTTTTCC AAAGTGATAA ATGCCTATAT TTAGCAAAAT ATTTTTTGGA 

71641 GGCCTGATAG TTTTTAGGAG TGTAAAGAAG TCCTGATATC TAAATGTTTA AGAACCACTA 

71701 TTTTAGGCTG TTGTCTTCTG TCTTATTTTC CCAGCTAGAC TGGTAAATAC TTGAAGGCAA 

71761 ACGTTTAGCC AGCACATTAA CATTTTATGT TTTTATTCTT TTGTGCTCTC AGTGGCTGTG 

71821 TCTTTTCTAT CGATTTCTCA CACTGTATGA TGGTTATATT TGTCTGTATC TGTCCCACCA 

71881 GGTATAAGTT CTTGAGAGGA CACACTGCTA GGCTGATCTT AGTTTTTATT ATTTCTCCTG 

71941 GTGTCCTGTG CTTAACAAGT GCTCATTAAG TGTGTAAAAA CACAGCACAG TAAAAAACTA 

72001 GACATTAAAA AATAATGTCA ACCAATCTAT TGAAATTTGC ATTTCCATGT TTCTTCCAAT 

72 061 ATAGTCATTG TGTCAGGTTA TGTACTTATT CTGATGAAGA CTATTGCCTA ATATACGTTT 
72121 GCATCTTGTG CTTTATAACT GCCTTCATAT AGACACAGAT TGAGAAGGTG TAAAAATGTG 
72181 CATATCCTCA CAATTGACTlA ATTCTTATCC TTTGAGGGTA GGTTTGACTT TCTGAAATGC 
72241 TTTGACATCA TTTGAAAGAA GCTTGAAGAA TAAGATAGCT GTTAATGACC CAGTTTCCTA 
72301 TGTCACTTAT ACAATTATAA TGGCAATTTC AAAATGTTAG GTAAATATAT TTTGCAATAT 
72361 ATTGTTCCTT TTGTAATACT CTCTATGTAT TTATTTATAT TTTTAAATTT TATATTTATG 
72421 TATTTATTTT TCTGGACAGA GTCTTGCTCT GTTGCCCAGG TTAGAGTGAA GTGTTGTGAT 
724 81 CATAGCTCTC TGCAACTTCA AACTGCTTGG CAAAAGTGAT CCTCCTGCCT CAGCCTCATG 
72541 AGTAGAGTAG CGGGAACTAC AGGCGCATGC CACTGCACCC AGCTAATCAC TATTTATTAT 
72601 GCTCCTACTG TGTGCTTTAG TATATTTTCT GTTGTTTTCT GCAACCCATT TTGAGGGCGT 
72661 GTTAGGGAAT ACAGATGCAG TAACTTTCGT CTCAGCCCTT GAGGTGAGGA AATATTTAGC 
72721 CTCAGGTTTA ATCTAATTGT TGGCCATTTG CCTTCAAAGA TTGAAATATG AGCAAAACTG 
72781 TGGCTCTGGG TTATATGTTA AAAAAAAGTT TATGGGGCTG AAGCCAGGCA ACAGACAAGA 
72841 GCCCCTACAA TCTTATTTAG GCTGAAAATA TCCTGGAGTC CCTGTATTGT TGGTCTCAAG 
72901 CAGATAGCAA CACTAACACT TACTCTTTGA GGCAGGCACT GCCAGTGGGG TGGCTGTTAT 
72961 TATTAGCTTC ATTAATTGGT GAGTCAGGAA AAAACAGCTT TAAATCATTC AAAGTTCTGG 

73 021 CCTATACAGG ATTTAGTAAT ATTAGGTTAG CTACATCCAA AAGATGACAG AACCCTACTC 
73 081 TAAGGCTGGG CTTGGTGGTT CACACCTATA ATCTCAAAAC TTTGGGAGGC TGAGGCAGGA 
73141 GGATCACTTG GTGCCAAGAG TTTGAGACCA GCCTGAGCAA CATAGTGAGA CCCCTGTCTC 
73201 TATCAAAAAC AAAGAACTCT AATTGGCATA GTAGAAGGAA AAAGTGAAAG AAAAACCAGC 
73261 TGTCACCCTC ATTCCTTACA CCTGTCCTAA CAACTCCTCT CACTATCCTT TGAATATATC 
73321 TTGGCTGTTT GAGTCTCTCT CTAGCCCCAT TACTGCTGTT TGGACTTGAC ATTTTGCTCT 
73381 GCATTTTTAA CTTTTCTACC AGGGTTTCCA GACCCTGAAG AGTGTGGCAT GAAACAAAAC 
73441 TAGTCAACCT ATAATATTTA TGATGTGTGT GTAAATAAAA GAATACACAA TATATTGCAT 
73501 TACAATATTT TAACTGTGTC CTCAATTTGT TTGTGGCTTT CTTGAGGACA TCAGTTTTGG 
73561 GTGGGACGAC CACATCCTTA ATCTGAACTT TCCCTTGGAG GTCATTCTTT TTTTTTTGAA 
73621 ATAGAGTCTC GCTCTGTCAC CCAGGCTGGA GTGCAGTGGC GCAATCTCAG CTCACTGCAA 
73681 CGTCCGCCTC CTGGGTTCAA GTGATTCTCC TGCCTCAGCC TTCCAAGTAG CTGGGATTAC 
73741 AGATGCACGC CACCATGCCG AGCTAATTTT TGTATTTTTA GAAGAGACGG AATTTCACCA 
73801 TGTTGGTCAG GCTGGTCTTA AACTCCTGAC CTCATGATCT GCCCACCTCA GCCTCCTAAA 
73861 GTGCTGGGAT TACAGGCGTG AGCCACCCCG CCCGGCCAGA GGTCATTCTA ATAGACTTTT 
73921 TTTTTGTTGT TGCTCACAGG CTTGTTCAAT CTTATTTCAA AATTTGAGAA ATACAGTTTC 
73981 CATGGAACAC CAACCAGATA TCAGGTTGCT ATGGAGTTGA TAGTCAAAAG CTTTGTATCT 
74041 TCCAGTTTTT CAGAATGGCT TCTAAAGGTT CTGATTCAGA GCTCTTAGGC GAAATTGAAC 
74101 AACCAAGTGT CAAAGTACAA CATTCAGGAA GTTAAAAACA TGACTGACAT ATATGTACTA 
74161 TATATAGTGA GCTTGTGTAT GTGTCAATGA ATGATTTAAT TCATTAATGA AGGAG6AAGC 
74221 AGAATCACAA TTAGGTCAAA GGAAGATACG GGAGAATAAA ATATGTATTT GGTCAGGGAA 
74281 AGGATGTATA CTGGAAGAGG AAGGGAAAAT CAGATATAAA GTTGTTTAAT GACTTATTAG 
74341 GCAATACAAT AATAACTTTT AGGGTCATTT TTTCTATATT AAGAATTCAT TTCCATCTCT 
74401 ATQACAAAAT CCTTATTAAT TTATTAAACT TCTACAAGTG AATGTTTACt TTTAGATAGT 
74461 CTGGACCCAA TAAAATGTAA ACATTAAGTC AGAGTTACTT TCACGTAGGA CAGTGTTGTC 
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CAATAAGGTA 
TAAAATGTTC 
AATGTCAACT 
TATATTAAGT 
ATGTGACCAC 
TATTGCCTTA 
CTTGGAAGGA 
CACACTTTAT 
ACCATGTTGC 
GGTCTGGACT 
CATATTAGAG 
CAATTTATAG 
ATGATGGATA 
ATCTAAGTGA 
CAGGCTGGAG 
CTGGAGTGCA 
TCTCCTGCCT 
ATTTTTGTAT 
CTCCTGACCT 
AGCCACCAAG 
CGACTGAGTC 
AACCTCTGCT 
CCAGCTAATT 
CAAACTCCTG 
TGGGCCACGG 
GTGCTTCAAT 
GAGGAATAGC 
TAGACTGTTA 
CAAATGTTAT 
TTTGTCATTG 
CTCTTTAACA 
ATATTACCTT 
TACTTTGCCT 
GAAGTAGTGA 
GAGGTGATTT 
TTCTTGTCTT 
ATCTTAAAAG 
TATATTTGAG 
TGAAAGTAAC 
TAAATAAATA 
CAAATCTATT 
CTTGAGGGCC 
AGAGCCCAAT 
GAGGGATTTG 
CCCCGGACCC 
GCCGTGGAAT 
GACCTGTGTT 
TTTCGCTTTT 
CCTTCAGGGG 
TGCCCCAGAC 
TTTGTGGAAA 
AAACCCAAGT 
AACACAACTT 
ACCAATCAGA 



CCACTAGCTA 
TAAAAGTGTA 
GTCTTTTTTT 
TAAATAAAAT 
TAGAAATCTG 
CATCATCAGG 
TATGGAGAAA 
AAAGGATCTA 
CAGGAGGTTG 
TGAGATTTGC 
AACTGAATCA 
TGAAAGAAGG 
TACTTAGCTG 
AATGTTTATT 
TGCAATGAGG 
ACGAGGCAAT 
CAGTTTCCTG 
TTTTTTTAGT 
CAGGCGATCT 
CCTGGCCTAA 
TCACCCTGTT 
TCCCGGGTTC 
TTTGTACTTT 
GCCTCAGGTG 
GGCCCAGCCT 
TGTTTATACA 
CGGTCTAAGT 
ATTCCCAGAG 
TTAATAAAAC 
AACTCTTATT 
CATATTTTCA 
TGTCCCTAAA 
TTAATCTCAA 
ACCTTAAAGT 
TTCAGCTCAT 
TCTGTGTTAA 
TCAAGAGTGT 
TTCCCAACTG 
CTACAATTTT 
AAAAATGCTT 
GGCTTTTTTG 
AGACCTCCTG 
TTCTCGCCTG 
ATAGTTTCAA 
TAGCAAGGCT 
CCTTGTCCCA 
ACTTCCCTTG 
TTCTTCAAAA 
CAAAGGAGCG 
TTCCTTCGGA 
AAATAAATGA 
GATTTGGTGC 
GGGAGCAGCG 
GCGCGCCTGC 



CACGTGATCA 
AAATACACAC 
TTAGCTTATT 
ATCTTAAAAT 
GAAAGTATTT 
TACCCCTITAA 
TATTTTTGCG 
GAAAAGGGTT 
GGGACAAGAT 
ATATAAAGAG 
CAGCGATTAA 
TCCAGTTACC 
AGTTTTAAAT 
TTATTTTTTT 
CAATCTCGGC 
CTCGGCTCAC 
AGTAGCTGGG 
AGAGATGGGG 
GCCCGCCTCA 
GTGACATGTT 
GCACAGGCTG 
AAGCGATTCC 
TAGTAGAGAT 
ATCCGCCCCC 
TATATTATTT 
CTTTCCATAA 
GTTTTTCCAC 
GACATAAGCA 
AATGGGGTCA 
TGTAGGTTCC 
TGAAAACATA 
TATGAATCTA 
GAAAAAAATA 
AGCAAACTTT 
CAACAACAGA 
ATTTTGCTAT 
GTTTTATTAA 
GAGATTGTCC 
CATGGGCTGA 
GTTTTCTTTG 
CAGGCTTAAG 
CCTTACACAA 
TAGAGAAGTG 
TGTCTTCAAA 
CATGAACCCC 
GTCCACAGTT 
TGAAGAAACA 
ATAAGGGAAG 
AACAGGTAAT 
GTTGGGGGAA 
AGAGCATGAA 
GGGGAATTTT 
CAGCGGCTCA 
GCTCTATATA 



TTGACCATTT 
CAGGTTCTGA 
TATTATATGT 
TAATTTTACT 
ATGTGATTCA 
GTAGGCTTTT 
TTGCTTTTAA 
GGTTACATGT 
TCTGGGTGGC 
ATGTGATTAG 
ATTTACATGT 
TGGTAATCAA 
GAGAAGGGGG 
TTTTTTGACA 
TTCTGGAGTG 
TGCAACCTCC 
ATTAGAGTTG 
TTTCACCATG 
GCCTCCCAAA 
CTTATATTGT 
GAGTGCAGTG 
CTTGCCTCAG 
GGTGTTTCAC 
GAGTCTCCCA 
CTTTTACTAC 
TTT.TGTATAA 
CACTGCTAAT 
CACAAGCAGA 
CCCTTAGTCT 
CTTTTGACTT 
TATTTGAGCA 
TAATTATATC 
GCAATTACTT 
AGAACAGAAT 
TCTTATAATA 
TTAAAAAAAT 
AGTCAGTTGC 
TATATGGTAA 
AATTCATTTC 
AAAACATATT 
GGCTCTCCCT 
CTCAGAGGGG 
AAAAGGATGC 
TCAAAGATTT 
CTCCCATCCC 
CCTGTGCGAC 
GAATTATCAT 
CATGTGCCCA 
TTATAAGAAA 
TTGGGGACGC 
GCCCOAGGCT 
AATATTTTTC 
GAGCCTGCCA 
TACAGCGGCC 



GGACTATAGC 
AGATTTATCA 
TGAAGTGATA 
TGTTTCTTTT 
CATTCTATTT 
TAGATAATTC 
GTTTTGCATA 
TTCTCTGTCT 
TGGATGTCCT 
ATTGAGTCGA 
CGATTTATAA 
GACGTTTCAT 
TTCATTGCAC 
TGGAGTCTTG 
CAATGAGGCA 
ACCTCCCGGG 
CCTGCCACCA 
CTGGCCAGGC 
GTGCTAGGAT 
TCCTTTCTTT 
GCGTCATTTC 
CCTCCTGAGT 
CATGTCGGCT 
AAGTGCTAGG 
AATATATTAG 
TTCTTATACC 
TCATCCATCA 
CAATGTTTAC 
AAAAGATGTT 
TCCCACAATC 
GAAATTGTTG 
AAATATATGG 
GGGGTCGGAG 
AGTTTCAGAG 
AATTACATGT 
AAATTTCAAA 
TTTATTTGCA 
CTTGCGTAAG 
TATATTGCAG 
ATCTCAGTGC 
TGTTCCTTTA 
GACCTCAGAG 
CCCACCCCCA 
AAGTCTGTAG 
GCCCTAATTG 
TGCACGAAGA 
GAAAATTTAG 
ACCACCCCTG 
AACAGAAAGT 
CTGGACGCGT 
TCTGAGATCC 
CCCTTTTGTG 
GCCAGGCGGG 
CTGCCCAGGC 



TAGACTGATT 
TTTAAAAAAG 
ATAGTTTAGA 
CATTCTTTCA 
TACTGTCTAG 
TCTAATATAG 
ACTTTTTCAA 
TCTGGCCTCC 
AATGGCTTGA 
CTAGAAAAAT 
ACCAGGACAC 
AGCTATTTTC 
ATAGAATAAG 
CTCTGTTGCC 
ATCTCGGCTT 
TTCAAATGAT 
CGCCAGGCTA 
TGGTCTCGAA 
TACAGGCGTG 
CTTTTTTTTT 
GGCTCATTGC 
GCCACCACCC 
AGGCTGATCT 
ATTACAGGCG 
TATGATGCAG 
CTGTCACTCT 
CTAATCTCAT 
AAATGTTGGA 
TCACTTTTCA 
TAAGGCTGTT 
GGGAGTTGTA 
GCAGACAATT 
AGTAAAATAA 
GGGATGAGAA 
TCTGGTACTT 
TACATTGTTC 
ACTCAAAAGA 
GTATGGTTAC 
CGTACAAAAA 
CTCTAACTGC 
TGATCTCTAT 
CTCTTTAAAA 
TCTATGAAAA 
CCCCCCACCA 
CTTTGGACTG 
ATTCACAGAG 
GTGGAAACCA 
GGAAAAAGAA 
GGTCTCTGAC 
TGTTTTTGTG 
TTTCCTGACC 
AGGTGGAACA 
CGACCAGAGC 
GCTGCTTCAT 
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77761 CGGCGCTTTG CCACTTGTAC CCGAGTTTTT GATTCTCAAC ATGTCCGAGA CTGCTCCTGC 

77821 CGCTCCCGCT GCCGCGCCTC CTGCGGAGAA GGCCCCTGTA AAGAAGAAGG CGGCCAAAAA 

77881 GGCTGGGGGT ACGCCTCGTA AGGCGTCTGG TCCCCCGGTG TCAGAGCTCA TCACCAAGGC 

77941 TGTGGCCGCC TCTAAAGAGC GTAGCGGAGT TTCTCTGGCT GCTCTGAAAA AAGCGTTGGC 

78001 TGCCGCCGGC TATGATGTGG AGAAAAACAA CAGCCGTATC AAACTTGGTC TCAAGAGCCT 

7 8061 GGTGAGCAAG GGCACTCTGG TGCAAACGAA AGGCACCGGT GCTTCTGGCT CCTTTAAACT 

78121 CAACAAGAAG GCAGCCTCCG GGGAAGCCAA GCCCAAGGTT AAAAAGGCGG GCGGAACCAA 

78181 ACCTAAGAAG CCAGTTGGGG CAGCCAAGAA GCCCAAGAAG GCGGCTGGCG GCGCAACTCC 

78241 GAAGAAGAGC GCTAAGAAAA CACCGAAGAA AGCGAAGAAG CCGGCCGCGG CCACTGTAAC 

78301 CAAGAAAGTG GCTAAGAGCC CAAAGAAGGC CAAGGTTGCG AAGCCCAAGA AAGCTGCCAA 

78361 AAGTGCTGCT AAGGCTGTGA AGCCCAAGGC CGCTAAGCCC AAGGTTGTCA AGCCTAAGAA 

78421 GGCGGCGCCC AAGAAGAAAT AGGCGAACGC CTACTTCTAA AACCCAAAAG GCTCTTTTCA 

784 81 GAGCCACCAC TGATCTCAAT AAAAGAGCTG GATAATTTCT TTACTATCTG CCTTTTCTTG 

78541 TTCTGCCCTG TTACTTAAGG TTAGTCGTAT GGGAGTTACT GAGGTATCAG ACGAATTGGG 

78601 TGACGGGGTT GGAGAGTGGC CGTGGTGAGG TTACAGCATT TAAACCTTTA TTGCGGCTTC 

78661 TAGGTCCCTG ACCGGAGGCT TTTCTCGCTG GCGGATGGTT TTGGGATGGC AGTCCCGCCC 

78 721 CAGGCCTGTG AACGGCAGAA AAGACCGCAA AACAAGAGCC AGTTTCTTAG TCTAAAGGGA 

78781 TGTCCGGATT GGACTAAAAA ATTTTCAAAA GTCCCGCCCT GCTCCCGGGT TGGTCCGTTC 

78 841 TTCTAGTACA TGACTTTCAT TCTGTATTTA ATTGGATGGT GGAAGACGTT GCTTATTCTG 

78901 TGTTTTTTGC TTTACTGTGA CTTAAAAGTT TTGCCTCTTT TCTCTTTATA TTAATGTCTG 

78961 GGATTTCGGA CGCTTTCCAT GTTGTTGGTA GTCAAGTTGA TGTCTCCTGG AGGTAGTGGC 

79021 AACATCCAGC CCTGGGAGGA GAGTGCGTGC AGGTACCTTT GTCCTACATT CCTCTGCTGT 

79081 TAATTTCTCA TTCCTGTGGC AACGAAGGAA TGCATTTAAA AAACAGCCAC AACAGCGGCA 

79141 ATAGCCCTTC CTCCACCCAA GGCAATCGTG GACCTAGGGA GTTTTTTGTG CCACATAACA 

79201 TGTAGCCTTC CGCTAAACTG ACAGGTTTGA GCGTATCGAT TTTGAGCGTA TCGAAAGCAC 

79261 AACTTTTAGC CAGCCATTTT GTCCTCGCAT GACTACGGTT GCTTATCCTG TTTAGACAGA 

79321 CAGCAACATT TAAAAATCGA AGTTCCTTTA AACGTATTTT GTTTGGCAGT CCAAATGTTT 

79381 CTATGCAGAA AACAGTATTT GTACTATTAA CTATGAAGAG TGTATGGATA AATGGGAGAC 

79441 ATTTCTAATA AAGGCCTTCG TTAATGGTTC CCTCTGTTTG ACATCCATGG TGCTTCTGAA 

79501 TACAGAAAGC CTAGCGTCTT ATATTCGCTT CTTTTAAAAT CTGGTGGGCA CATTTTGGTG 

79561 AGACCTAAAT TATGGGGACT GGGGCTTCTG GAGATAAGCT GCTCAATTAT TCTACCATCT 

79621 CCACAATGAT TAATATAGTG AGTTGATTTG TTAGTGATAG TGACCACGGA TTCATCCCAA 

7 9681 GAAAGAGAAA GGGGAGGGAG GCAAGCAGM AGACAGGAAG ACAGAGGCAG GGAAGAAGGA 
79741 GAAAACATTC TCCCATGGTT TAAGTAATTT TGTGTTGTTA ATTTTACATT ACAACACGGT 
79801 TTAACATGGT GAACCCTCTA TTTTGGTGTA AGGTTTAACA TATGGACATA TTTTTCCCAA 
79861 GACCATTTAT GAACTTTCAT TTCTGCTTCC CCCTTCTTCC TCCCGTGCCA CCCTCCACGC 
79921 TCCTATCAAT TTTGGCTGTT TTGTCATAGG CTAATACGCT ATAATTTCAT GGACAGTTGG 
79981 ACTGTCTTAG GTTTCTCAGG TTTCTATTTT GTTCCTTTAG TCATTCCCAC AATTCTTAAG 
80041 GTAGAATTGT ATTGTTTTAA ACATTGTGTT GTGTGCTATC CTCAATGCTG AGATGATTAT 
80101 GTGACAAATG GCAAGTGTTC AACTAATACC TAAATCTGTA GTATCTTATC AAGCCTAATG 

8 0161 CTACTTCACA ATGCCTACTC CATTCACCTC ACTTTATCTC ATTACTGGCA TTCTGTCATC 
80221 TCACATCATC ACAAGTAAAA CGGTAAGCTA TTTTGAGAGA GATCACAGTC ATATAATTTA 
80281 TATTTATATT TATTTATTTA TTTATGAGAC GGAGTTTCCC TCTGTCACCC AGGCTGGAGT 
80341 GCTGTGGCAC GTTCTCGGCT CACTGCAACC TCCGCCTCAC GGGTTCAAGC GATTCTCCTG 
80401 CCTCCGCCTC CCGAGTAGCT GAGATTACAG GGGCCTGCCA CCATGCCCGG CTAATTTTTG 
80461 TATTTTTAGT AGAGACGGGG TTTCACTAAG TTGGCCAGGC TGGTCTCGAA CTCCTGACCT 
80521 CAGGTTATCC GCCCACCTCA TCCTGCCAAA GTGCTTAGAT TACAGGCGTG AACCACCGTT 
80581 CACAGACTCA AATCATTTTT ATTACAGTAT ATTGTTATAA TTGTTGTTTT ATTATCAGTT 
80641 ATTGCTAATC TCTTACAGTG CCTGATTTAT AAATTAAATT CATCATTGCC ATGTGTATAT 
80701 AGAAAAAAAC AGTGTATATA CGGTTCAGTA CTATCTGTGG TTTCAGGCAT CCACTGGGGG 
80761 TGCAGTTTAT TAAACATGCA TTTACATTAG TCTCCCCTTT GGGAGACTAA TTAACTGAGA 
80821 TGTTGTAACG TGACTTTAAT AGCAGATAGA GCTAATTTTC TCTCATTACT CTTCTTTTTC 
80881 AGAATTTTCC TGGTTATTCC ATTTTTTATT TTTCCATATG TATATTAAGA TCTCTTCCAC 
80941 CTCCTCCTGT TTCTCCATCT CAACATCAAA CAATTAAAAA AAAAAAAAAG GCTGGGCGCG 
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81001 GTGGCTCACG CCTATAATCC CAGCTCTTTG GGAGGCCTAG GCGGGTGGAT CACGAGGTCA 

81061 GGAGTTCAAG ACCAGCCTCG CCAAGATGGT GAAATCCCGT CTCTACTAAA AGTATAAAAA 

81121 TTAGCCAACC ATGGTGGCAG GCGCCTGTAA TCCCGGCTAC TCGGGAGGCT GAGGCAGAGA 

81181 ATTGCTTGAA CCTGGGAGGC GGAGGTTGCA GTGAGGCGAG ACCTTGCACT CCAGCCTGGG 

81241 TGACACAGCG AGACTCCGTC ATAAAAAAAA AAAGCCGGAA GCAGTGGCTC ACGCCTGTAA 

81301 TTCCAGCACT TTGGGAGGCT GAGTCAGGCA GATTACCTGA GGTCAGGAGT TCAGGACCAG 

81361 CCTGGCCATG AAAATACAGC CTGGCCATGA AAACACACAA TAAATTAGCT GGGCGTGGTG 

81421 TCACACACCT GTAATCCTAG CTACTCGGGA GGCTGAGACA GGAGAATCAC TTGAACCCAG 

81481 GAGGCAGAGG TTGCAGTGAG TTAAGATGAC GCCACTGCAC TCCATCTGGG CGACAGAGCC 

81541 AGACTCTCTC TCAAAAAACT AAATAAATAA AAATAAAGTT ATGGTACATT GAACTTCTGT 

81601 GTTCCTTTCT CCCTTAGATA CTTTCATGGC TACCCATTTA ATTGATGTTC TTATCATCTC 

81661 CAAGAGTTAG TCAGGAGAGG AATCAACCCA AGCAAAAATA GCTGATTTTC TAATTTTCCT 

81721 TCAATGCCCT TTGGGGTCTT AATCCATTTG ATTTATGTAC TTTCAATTAA TCCTAACCTC 

81781 GAATGTCTTC TGCAAACATG TTTCCACAGA TGAAACTCGT CAAATGAAAC ACATTCCTTT 

81841 AATTTATAGA GTTAAAAATT AGAAAAATTT TCAATTCTAT TTGGCCTTTA GATTCAGTCT 

81901 TGCATATGTT TTCTCAATTT TGTTCATGCT CTTTAGTTTT GTTTTATTCC ATCACAATTG 

81961 TTCACATAGC TTACTGGCTT AGGTCTAATG AACCATTCAT TTGGAAATTA AAATTGGCCA 

82021 TTTTAAGATG AAAAAGATTC TTGCCTCAAT TTTACTTAGT TTTTGAAACT GTCAATGAGG 

82081 ACACATGTTT TTCTGTACTC TTAGATTCAC TAAGTAGTGT CTTGCAAATT TAACTGACAA 

82141 AGGACAGATT AACATGCGAA AAAAAGAGCA TGCAATTTTA TTAGTATATT ACATGCACAG 

82201 AGTTCCCAAA GAAAAAAAAA TTGAAACCTT AAAAACGCGG TTAGACTCAC AGACTTATAC 

82261 ACCATTCCAA CAAAGGAAAG GGAGTTTGCA CTTCATGGGA TGACGAATTT GGGAATGTGA 

82321 CAAGGAAATA AATACATGGG CAATAAAAAC CATGGAAGAT AAAATGAAAG ATAGAAATAA 

82381 TTGTAGTAAG GTTTGTTTTT GCAGAGTCAT CTCAGTGCCTl ACCTTCCATA TCTAGTGATA 

82441 AGAATTGCTC TCTTTTTCCT GGTATAGCAG TTGGGGACAC TTTTACAAGG GAAATTTCTG 

82501 TCACCTTCAC AAAGGGAAAT TTGGGTAAAG AGAAGACAGA GACCTCTTCC TACACCTGTT 

82561 GATTTTCAAT TGCCTTCAGC TGAAAATAAC TTTTATGCCA AAGTAGAATA ATTTGGGGGT 

82621 GACATCCTGA TATTCTTCAA AACTTATATT TAATTTCACA TTAGTAATTA TATCATTTTT 

82681 GATTTTTAAA TTAGTTTTAT AAAATAATTT TGAAAAACGG TAATAATATT CAAATAATTC 

82741 CAGAAACACT GCTGATAAGC CAAAAACATC AATGAATATT GCATAAACAA CTGATAATTC 

82801 AACCATGAAA ATTTATGACA TTGTTCTTGT GTGATAAAAC TATGAGTAAC ATAAAAACTA 

82861 GAGGCTACTT GTAATGCATT ATTCCAAACT TTCTGTTTTT TATTTATTTA TTTATTTATT 

82921 TTGAGACATA GTCTCTCTCT GTCACCCAGG TTGGAGTGCA ATGGCGTGAT CTTGGTTCAC 

82981 TGCAGCCTCC ACTTCCCCGG TTCAAGCAAT TCTCCTGCCT CAGCCTCCTG AGTAACTGGG 

83041 ATTACAGGCA CCTGACACCA AACCCGGCTA ATTTTTTTGT ATTTTTAGTA GAGACGGGGT 

83101 TTCGCCATGT TTGCCAGGCT AGTCTCGAAC TCCTGACCTC AGTGATCCAC CTACCTCGGC 

83161 CTCCCAAAGT GCTAGGATTA CAGGCGTGAG CCACCATGCC CGGCGCATTA TTCCAAACTT 

83221 TCATACACAG TGCTATCATG GCTACAAATT GAAGTATCAT ATTATACACT CCTAGGCAAA 

83281 GCTCTGGATA TTTTGGCTAT ATAAGCCTGA GGGAAATGTA GTAAGGACAT TGTGGTTGAA 

83341 ATTCATACCA GAGATGAACA GGCCCAGTGC AAGACAGAAT TACATCACTA AAGGATATCA 

83401 GAAGAGAATA GGGATTTAGG GTACAGTGGC AACAACAGTT TTGGGAACTA GCATTTTTTG 

83461 AGCACTTATT TACAATATGC CAAGCACTGT TGCTGATTAC TCTATATTTA TTTTCAAACA 

83521 CATTCTTGTC ACAGCACTTT GAAGTAAGTG CCATTGTCAT TCCCACTTCA GGGTGAAGGA 

83581 CTAAAGCTTG GTGTCATTAA GGATGTAGCT AGTTAGCTGT GTGTGTGTGT GTGTGTGTGT 

83641 GTGCATTTTT TTTTAAATTT AAAGTCAATA AATTTTTATT TGAAGAATTT CACATCAAGG 

83701 TAAACTTTGT TCCTCTAAAG AGCTGGAGTC AAAATGTATC TTCAAAAGAT TCATCTTCAA 

83761 GTTAGCCCTT CTTAATAGAA CTGATGCTTA ATCCACAGTT GTCAGCCCAC AGTTCTTTTA 

83821 TTTTGACTTT TTTTTTTTTT TTTTTTTGAG ACGGAGTCTC TCACTGTCAC CCAGGCTGCT 

83881 GGGCAGTGGC GTGATCTCGG CTCGCTGCAA CCTCTGCCTC CCGGGTTCAA GTGATTCTCC 

83941 TGCCTCAGCC TCCTTAGTAG CTGGGACCAC AGGCGCATGC CATCGTGCTC GGCTAATTTT 

84001 TGTATTTTTA TTAGAGACAG GGTTTCACTA TGTTGGCCAG GCTGATCTCA AACTCCTGAC 

84061 CTCATGATCC GCCTGCCTTG GCCTCTCAAA GTGCTGGGAT TACAGGTGTG AGCCACTGCA 

84121 CCCGGCCTTA TTTTGCCTTC TTTAATCTCC ATTTGAACAT ACACATACTG ATGAAAACTA 

84181 CAACATTCTT CACCAAAAAT CTTTGGGATT TAATTTCTTC AACCACTTTA CTTTGGGGTC 
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84241 ATTTTAAGAT TAGGTGTATC TGCCTGGTTC TCAATTTGAC ACCCTTTCTC TCTAAACATG 

84301 AATGAGTTCC AATCATATTT ATTCCTAAGC TATCACACTC AAATATACTA CAGATCTGTG 

84361 GAATATGCCA AAAGTTAAGG TGAAAAATTA AATTATTAGG TATTTCATAG TTTTGCTAGT 

84421 TTTTGATCTG TGAGTGAATA TAACTATCCT CTATGTCCTG GCACTGTTCC TCAGAAACAT 

84481 AGGGTCCACA TATGTAATTT TAAATTTTTT AATAGGCACA TTTTAAAAAG TGAAAAAAGA 

84541 AATCTATTTT AATGATTTGA ATCCAGTGTA ACCAAAAATT GTTTCAACAA GGTATCTAAT 

84601 ATTAAAATAT TGAGTTTTTA CTTTGTTATT TTACTAGTTC TTTGAAATCT GGTGTGTATT 

84661 TTACACTTAA AGCACATCAC AGTTTGGAGT AGCCACATTT CCAATGCTTA ATACTCACAT 

84 721 ATGGTTAGTG GCAACTATCT TGGACAGGAC AGCTTTTATA CTCTGGGAAG ACACAAGCAA 

84781 ATACTTGCTC TGCAGCAGAA TCCAGATGTT TTCCAAGAAA ACACTTTTTC TGACCTGTTC 

84 841 CTGAAACCCA GGTAGTGTCT CTAATACTTT ATATTTTATT GGTTTGTCCT ATTGTAACCA 

84901 CCCAACGGGC TCTCCTTGTC CACTTCCTAG ACAGAGCTGA TTTATCAAGA CAGGGGAATT 

84 961 GCAATAAGGA GCCAGCGCTA CAGGAGACTA GAGTTTTATT ATTACTCAAA TCAGTCTCCT 

85021 TGAGAATTTG GGGACCAAAG TTTTTAAGGA TAATTTGATT GTAGGGGACC AGTGAGTCGG 

8 5081 GAGTGCTGCT TGGTTGGGTC AGAGATGAAA TTATAGGGAG CCTAAGCTGT CCTCTTGTGC 

8 5141 TAAATCAGTT CCTGGGAGTG GTGGGGTGGG GGACTCAAGA CCAGATAATC CAGTTTATCT 

85201 ATATGGGTGG TGCCAGCTAA TCCATTGTGT TCAGGGTCTG CAAAATAGCT CAAGCATTGA 

8 5261 TCTTAGGTTT TAAAATAGTG ATTTTATCCC CAGGAGCAAT TTGAGGTTTA GAATCTTGTA 

85321 GCTTCCAGCT GCATGACTCC TAAACCATAA TTTATAATCT TGTGGCTAAT TTGTTAGTCC 

85381 TGCAAAAGCA GTCTGGTCCC CAGGCAGGAA AGGGGTTTGT TTCTGAAAGG GCTGTTATTG 

85441 TTTTTGTTTA AAAGCAAAAG TATAAACTAA GCTCCTCCCA AAGTTAGTTA ATCCCAAACT 

85501 CAGGAATGAA AAGGACAGCT TGGAGTTTAG ACGTTAGATG GAGTCGGTTA GGTAAGATCT 

8 5561 CTTTCACTGT AATAATTTTC TCAGTTATGA TTTTTGCAAA GGCAGTTTCA CTGTCCACTT 

8 5621 CACCTCACAT CAGGCCTCTG ACTAGAGGAT TCCAACAATA CTTAGGCCAG GACACCACCA 

85681 TGTCTCCTTA TCCACCCTGA GGGAGTCCAA TTTCTGAAAC AAAGGAAACT ATATATGATA 

85741 GTATGAAACT ATATATGAGA AGGAAATTAT ATATGATAAT CAATTTTAGG GTTATCTTAT 

8 5801 TGATTAGAAG ATATTAAAGT GTGACACTGC CTGGCAATGA TATCTGCTGG TAGTAAGAAT 

85861 TTGGCGAATT TAGTGAAATT CCTGAGGCTG AACCTCCACT TCTGTAAAAT GGAGACAGTG 

85921 AGATAATTTG CCTTACAATG CTGAAGTAAG AATTTTACAC AATAATTCAG ACCAACCACT 

85981 TCATGTGGTA CTTGGCCCGT GGAAGACTAT CAATGACAGT TAGTTTATAG TTTATACTAT 

86041 TAATGAATCC TTTGTTTCAT TGTTATTTCC TTCTACACGT TGGCCTCTCT AAAAGAAGGT 

86101 AATATTCAAT ACAAATAAAG TTAAAACAGC TTGCAGAGTT GTCCCAGGGA ACTCACTTAA 

86161 CCACTGAAGT GTTCAAATTG CTTAAGGTTG ACTTTATATT CTCCTGACTA ACCTTTCTCC 

86221 TTCTGGTATT TCTTCTGAGA ACAGCACCAC CATCCAAAGC ATCATGCAAA CAGTGGTCAT 

86281 CCCAGACCAG TAATTCTCAA CTCACAGGGT GCTCCTGCAG AGATGTATTT GAATAGAGTG 

86341 GTAGGATGCT GAAGAAGGCC ACGTAAAATT TGGCCAGTGA TCTGGGGCAG ATTTATCCTG 

86401 AAGCTAATGA AACACAAGTG TAAGGGCCTG TACTTCCAAG GTGCAGAQAG GGGCCCTACA 

86461 AATGTGTTAG TTTGTCTCTC TCTCTCTCTC TGATTTTAAA ATTTGCAGTA TTAAGGTACT 

86521 TTAATCACGG ATGGTTCAGG CTGCTATTTT CACTCAATCC TCCTTTTTAT TAAAATCACC 

86581 ATTGTCTGAT TATGTTAGAA TCCTGATGAA AATATTTGGA ATTTGAGTAA GAGAAAGTTT 

86641 AGTTGAAGAT GTATCTAGTA TGGGGATAAT AAGTTACGTG ATTTGCATAT GTGATCATGT 

86701 GTACTTCATT CGTTGCCAGC CAATCTGACG TAAGAATGGC TTCAAGGAGG CCGGGCGCGG 

86761 TGGCTCACGC CTGTAATCCT AGCACTTTGG GAGGCCGAGA CGGGCGGATC ACGAGGTCAG 

86821 GAGATCGAGA CCATCTTGGC TAACACGGTG AAACCCCGTT TCTACTAAAA ATACAAAAAA 

86 881 TTAGCCGGGC GTGTTGGCGG GCGCCTGTAG TCCCAGCTAC TTGGGAGGCT GAGGCAGGAG 

86941 AATGGCATOA ACCTGGGAGG CGGAGCTTGC AGTGAGCCGA GATTGCGCCA CTGCACTCCA 

87001 ACCTGGGAGA CACAGCGAGA CTCCGTCTCA AAAAAAAAAA AAAAAGAATG GCTTCAAGGA 

87061 ATGTTCCTAC TGCTCACTGG AATAACTCAC CTAAATTCCT GGCAAGATGC AGGTCTAGAT 

87121 AAAATGTTAT GACATCTAAG TATTCAAAAC ACATTCCCAG CACTGAGAGT GAGTGTCTAG 

87181 TGGAGAGTAG AAACGTATAG AGCCAGAAGC TAGTCTGGAA AGAATTCTTA CAAAGTTTAC 

8 7241 AACTTACATG TGAAAGGAGC TTAACAGAGG ATTTTCCAAA TTTGAAAACA ATCCTAAAAA 

87301 CTTACTTGAC ATTACCAATA ATGTGTTTTG AAACTGAAAT ACTTCTAAGT TATGAAGAAA 

87361 ACATATTATC ATCAGCCACC CTGGAGGAAA GATTGAATTC TATTTCCATT ACCTATAGAC 

87421 AACATTACAA AATAATTTCG ATCTGAAGAT GGAATCAGAG TATTCAOTCA AAACTACAGG 
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90001 
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AAAATATACT 
GTGATGGCTG 
TAAATTTATA 
ATAGTTAACA 
AGTATACTAA 
TTAAAAGAGG 
GGATAAGGAT 
TAAATCTATT 
CTGACTCCAC 
GTGGACTTAQ 
GAACTAAAAT 
TATGTAAGTA 
AAAACAGTAA 
GTCTGCAGTT 
GACAACCGAA 
TTTTCAATGA 
CTGGATATCT 
TTCGAAGAGT 
CTGGAAACGC 
CAGCTTCCGG 
GGTGCCCGGG 
GGCTGCTTTA 
TTGCTTCGTA 
AGTGGCCTTT 
GCGCGATAAA 
TATTGGATGA 
AAATTGTCTA 
AAGGATTTTT 
TTCCTGACAG 
TCAGACTCAT 
TACTGGCGAA 
CTGCCTGTTC 
AATGTACTTT 
CTAGGGGGGC 
TCACTAGAGG 
AAAAATACAA 
GCTGAGGCAT 
CCGCTGCACT 
AAAAGCAAAA 
GCTCTGAAAA 
CTTGGCCTTA 
GCCTCCCTGA 
GGCCAGCTGC 
CAGTTCTAAG 
CTCAAAATAA 
CCGGTAGCGA 
TATGAAAGCA 
TTTTATACAA 
CTTTATCCAA 
GAAACCGTGT 
AGGACTATAA 
AGTAGCTTTT 
CCCTAAAAAG 
TAAGCGCAGC 



TGGTAGTGTC 
TTGTTTTGTC 
TTTACAGTCT 
AGTTGTAAAA 
TATATTTAGA 
ACATGGGTAA 
GACCGCATAA 
TAGTGGACTT 
CTCCAGCAGC 
GTAACTACAC 
TGTCACGTGG 
TTTACACATA 
CTCAGCTTGT 
ACAAAACTTG 
TGGGTTACAA 
GGAAGAAACG 
TTGGGCATGA 
CCCACCAGGT 
AGGTCGGTTT 
ATCAGCAGCT 
CGGTAGCGAT 
GTAGCAAGCT 
CGAGCCATTT 
AAATATAGTG 
ATCATTGGCT 
GTTGCCCCAC 
AAATTCTAGT 
AAAATGTAAA 
TCTCGCAAGT 
GTCGGGAAAT 
CAGCAAGTTT 
TCAAAATGTC 
CTAAAGGAAG 
GGTGGCTCAC 
CCAGGAGTTC 
AAACTAGCTG 
GAGAACCGCG 
CCAGCCTGGG 
AATACCCTAA 
ATGCCGTTTC 
TCGTGGCTCT 
GCAATAGTGA 
AGGTGGCGGG 
ATCTCGGCGG 
TTGCCCTTTC 
CGAACAAGTT 
GCGGAAAACT 
ACTGCTU^GGC 
TAGAAAAAGA 
TTCTTTTGTC 
ATACATGGGC 
CTATTCTGTT 
GGTTCTAAGA 
CGCAAGGAGA 



ATATTCAGAA 
AGCTTTTATA 
GCAGTACTTT 
GGTTTGATCC 
AAATGGATGA 
AAGAGCTTTG 
TCTTTGGATG 
TTGGCAGTGT 
CCAAAACCAA 
ACACATTGTC 
ATTAAAAGGA 
TACATGCTAA 
TTTCTCGCAG 
TGTGTAGTTA 
CTGTTTTTAA 
GGCAGACTTA 
TGGTGACGCG 
AGGCCTCACA 
TGAAGTCCTG 
CGGTGGACTT 
GAGGTTTCTT 
GCTTGCGCGG 
GCAATGAGAG 
AGAAACATTC 
GAAGAGTGAC 
CGCCCATCCT 
TCATCCAGTC 
TTCCGATTCA 
TATCAATGCT 
AACGCTTATA 
"CCTTGCCCTT 
TTATTTTGGT 
GTGTTATTTT 
GCCTGTAATC 
AAGACAACCC 
GGCGCGGTAG 
TGAAGCGGCG 
TGACAGAACT 
CAGAAGCAAG 
AAGTGTAAGC 
GTTATTTTGG 
CGTTGCCCAG 
GGATGATGCT 
CCAGGTATTG 
GAAAAAGATG 
TTTGCTTTAG 
GTGAAAGACA 
TGCAATAGGA 
TAACATAAAT 
CAATCAGAAG 
TCTGAACTGT 
TAGGAATAGC 
AGGCTATCAC 
GCTATTCTAT 



GTTAATAAAA 
AAATTGGAAT 
TGCATTTTTA 
CCAGAAAACC 
AATCAGCATT 
CAGTTGCCAC 
GTCATACGCA 
GTACTGAGGC 
TACTGAATTT 
TTTATGATAG 
GTGACGGTGG 
AAAGACCCCT 
TAAAACCGGT 
TCACCTTTAT 
GTGAAATTGT 
TGCCCTTTCC 
TTTAGCGTGA 
AGCCTCCTGC 
GGCGATTTCT 
CTGGTAGCGA 
CACGCCACCG 
AGCTTTGCCG 
CACACACAAA 
TGATTGGTCC 
CAGACTGATT 
GTCCTTTTCG 
CCAAAGAACA 
GTAAGTTTGA 
GGTGAACACT 
TTCAGAGAAT 
TGTTTTCTAA 
TGGCCTTAAG 
CTCGAAACTT 
CCAGCATTTT 
TGGCTAAAAT 
CAGACGCCTG 
GGGTGGAGGT 
AGACTGTCTC 
TTATCATCCT 
TACGTTTTCT 
CAACAGGACG 
CTGCTTGTTG 
GCGGGTCTTG 
TAAGTACACT 
ACGGACTCTG 
CTCCATTTTC 
AGCAAGCTGG 
AGCTATCCTA 
TCCATATTTG 
TGAGGAATCT 
TCTCTGTACT 
AATGCCTGAA 
TAAGGCGCAG 
CTATGTGTAC 



TATGCTATTT 
TTGATTTTAT 
ATTTTACATT 
TTGATCTACC 
TGAATATTTT 
CCTTCATTCT 
AGTCTTGTGT 
CAGTTTCTTC 
TGGGGTCAGC 
CTTTAATAAT 
TGTCCCCAGG 
AGGAATTTTT 
TGAAAAGGCC 
ATCTCCTGGA 
GAGTGQCTCT 
CCACGGATGC 
ATAGCGCACA 
AGCGCCATCA 
CGCACCAGGC 
CGGATTTCGC 
GTGGCCGGAG 
CCGGTAGACT 
AGTGTAGTGA 
TGTAATATTT 
GGTTCATTAC 
TTTCAGTTAT 
GAGTGTATAA 
GTGGGACTTG 
CACtAAACCA 
GAGATTCCAT 
GTCCAAGTCA 
TTTCACTTTG 
AACTTTTTAA 
GGGAGGGCGA 
GGTGAAACCC 
TAATCCCAAG 
TGCAGTAAGC 
AAAACAAACC 
TTCTTGTGTA 
GATTTGAGTG 
GCCTGAATAT 
ACCTCCTCGT 
TCACGTATGG 
GGCGCACCGG 
CCCTATTGGG 
CACGTCCGCA 
AATGGCGCCT 
TTGGTCAATT 
CATAAACCCC 
TAAACCGTCA 
ACTCTGTAGT 
CCCTCTAAGT 
AAGAAGGATG 
AAGGTTCTGA 



TCTGAATTTT 
TTTCCCATTA 
ATAGTTTTTA 
CCATCAGTTA 
TAAATATTTA 
CAAATTCCCT 
ACTTGTTACA 
CACCTGAGCT 
TATTGTTTTT 
ACTGCCATCA 
AGCCTTTCAA 
TAACAAGGGC 
TGATAGACTT 
AACTAACATA 
GAAAAGAGCC 
GACGTGCCAG 
GATTGGTGTC 
CCGCAGAGCT 
GCTGGAACGG 
GCAAGGCCAC 
CGCTCTTACG 
TGCGAGCTGT 
ACTGAGAGCA 
CAAAAGTCCC 
TAGACAATCT 
CTGCAGCGAC 
CAAGGTATCT 
AAATTCTGCA 
CCAGAAACGT 
GCTATTTTGT 
CATTCCCACC 
TATACTCTAA 
CACCATTAGG 
GATGGGACGA 
CGTCTCGCAT 
TACACAGGAG 
CGATATCGCG 
AATCCAAACG 
ACTATGGACG 
TTTACTTGAC 
TGGACAGGAC 
CGTTTCGGAT 
CGCTGCCCAC 
CTCCGACCGG 
AACTGCAAGC 
AATAGCGACC 
GAACAAATCC 
ATGTTTGGTG 
ACCCCTCAGT 
TTTGAATCTC 
GGAGAGTGTT 
CTGCTCCAGC 
GTAAGAAGCG 
AGCAGGTCCA 
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90721 CCCCGACACC GGCATCTCAT CCAAGGCCAT GGGGATCATG AATTCCTTCG TCAACGACAT 

90781 CTTCGAGCGC ATCGCGGGCG AGGCTTCTCG CCTGGCTCAC TACAATAAGC GCTCGACCAT 

90841 CACCTCCAGG GAGATTCAGA CGGCTGTGCG CCTGCTGCTG CCTGGGGAGC TGGCTAAGCA 

90901 TGCTGTGTCC GAGGGCACTA AGGCAGTTAC C7VAGTACACT AGCTCTAAAT AAGTGCTTAT 

90961 GTAAGCACTT CCAAACCCAA AGGCTCTTTT CAGAGCCACC TACTTTGTCA CAAGGAGAGC 

91021 TATAACCACA ATTTCTTAAG GTGGTGCTGC TGCTATTCTG TTTCAGTTCT AQAGGATCAA 

91081 CTGGAATGTT AGCGAAGACA AGTTTTAGAG CCT^GGTTAA CTTGGACGGG GCCGTGCGCG 

91141 GTGCCTCTTG CCTTTAATCC CGGCAATTTG GGAQGCCGAG GCGGGCGGAT CACGAGGTCA 

91201 GGAGATGGAG ACCATCCTGC TTAACACGAT GAAACCCCGT CTCTACTAAA AATACAAAAT 

91261 AATTAGCTGG GCGTGATGGT GGGCGCCTGT AGTCCCAGCT ACTCGGGAGG CTGAGGCAGG 

91321 AGAATGGCGT GAACGCGGGA GGCGGAGCTT GCAGTGAGCC GAGATCGCGC CATGGCACTC 

91381 CAGCCTGGGT GACAGAGCGA GACTCCGTCT CAAAAAAAAA AAAAAAAAAA AATTAAAAAA 

91441 ATATGAAGTT TTGAAGCAGA AATTATTTTG TCGTATGTTC TTTCATAAAT TTTTTGCCTG 

91501 CCTGCCTTCT TCCTTTGTTA CAGAACTCCA ACACTTACCC AAAGGTAGCT GTTGGGTCAG 

91561 GGTTTCTGTA CTATAGTCCC TTCTGTGGTG GCCAGAAATA TGTTACAGGA AAGAGGTCCC 

91621 CATCCAGACC CCAAGAGAGG GTTCTTGGAT CCCGCGCAAG AAAGAGTTCA GGGTGAGTCC 

91681 GCAGTGCAAA GTAAATGCAA GTTTACTAAG AAAGTAAAGT GGTGAAACGA CAACTACTCC 

91741 ATAGACGGAG CAGGACATTC CCGAAAGTAA GAGGAGGAAG GCATCCACCC TAGGTACAAT 

91801 ACTTGTATAT ATGGGGAGAT GTGCTCTGCT ACAAGTTTGT GATAAAGGAT TAATTTTCTT 

91861 AGTTACTATA TTTTGCAAGA ATCAACATTA TTATCTTTAA ACAAAATTAA GAATGCCTTT 

91921 GTTCTCCAGA TATAGGGATA TCTGGACACT CCTAAGTCTG AGTCTGTTTA GTAAACATTA 

91981 TTTATTTGTT CCCTTAACCG TAAACATCTA GAAGCTAGGA ATGACTGACT TTCTGGGAAT 

92041 GCAGCCCAGA AAGTCTCAGC CTCATTTTCC TAGCCCTCAC TCAAAATGGA GTTACTCTGG 

92101 TTCAAGTAAC TCTGACACTT TTCTTCTCTT TTTTTCTTCT TTTTTCCTTC CTTTATTTTT 

92161 TATTTTTTAT TTTTGAAATA AGAAATCAAG AATACTTGAT GTTTCATCTA AAACAATACC 

92221 CATAATTGAT AAGCCAAAAC AAAAACCTAG GTCTTCTAAC TCAAAACTAG GATGTTTTGC 

92281 TGTCTCTGCT GATACTCGGC TGATCGTTAA TAGGTAATTA ACAAACAAGC CTTGCTATGT 

92 341 CCCCCTCAGT TTATTACCAT TAGATCATAT GCCTACTGTC AATCATATTA ATCCACAACT 

92401 ATGCATTTCA CAAAACTTGC CATAAAAATT CACAGGTTTC CCGCTTCCCT CGAGTTTTCA 

92461 TTTCCGAAGG GTCCCATGTA ATATAAAACT TATATTAAAT ACATTTGTAT GCTTTTCTCT 

92521 TGCTAATCTT TTTTTTTGTT TTTTGAGACT GAGCCTTGCT CTGTCACCCA GGCTGGAGTG 

92581 CAATGGCGCG ATCTCGGCTC ACTGCAACCT CCGCTTCCCA GGTTCAAGCG ATTCTACTGC 

92641 CTCGCCCTCC CGAGTAGCTG GGACCACAGA TACGTGCCAC CATGCCCCGC TAATTTTTGT 

92 701 ATTTTTAGTA GAGACAGGGT TTCACCGTGT TGGCCAGGAT GTTCTCAATC TCCTTACCTC 

92761 GTGATCCGCC CGCCTCGTCC TGCCAAAGTG CTCGGATTAC AGACGTGAGC CACTGCACCC 

92 821 GACCAATCTG TCTTTTTGTA GAGGGGCCTC AAGCATGAAC TTACTGATGG GTGAGAAAAA 

92 881 CAGAATTTTC TTTTCCCCTA CAATATAAAC ATTAATTGTA ATGTTATCAT TCAGGACATT 

92 941 TTGGTGACCA ATCTTACAGA AATTTTATCT TGTGCAAGTC TATGCAAACC AATATGTAAA 

93 001 TCTTCTATAA GTGAGATTGT ATTTCACTTT TCTAGTATCC TTTTAAATTA ATAAAAGAGA 
93 061 TTCTAATGAT TATTTTCATT ACTGCATTTC ATTGTAGGGA AGTAGATAAT TGCCCTTTAT 
93121 TCACTGACCT TCGCTTTTTA AAAATTTAAA CCATGTTACC ATGAAAATGC TTTTCAGTAT 
93181 TTCTCTACAC ACAAGATTGC TGTAAGGGCA AAAATAGAGA TAGGAATCAT GCATCCATTG 
93241 ATATACATAT TTTGATTTTT AATACATGTT ACCAAGTTGC CTCCTGAAGG TCTGTTTACA 

933 01 CTCTCACCAA CAGGGTGTTT TTTCCTGACT TCCACAAATG CTCTTGAACA GTGGGTGTGT 
93361 TAGTCTGTTC AAATTGCCGA CATGAACAAT TAAATCTCAT TGTTGTTTTT ATTTTTAAGA 

934 21 CAATTATTGT TTGAGACTGC ACATTTTGAT AATAACATTT CTTCTATTAT GGTTTGATTA 
934 81 CTCATGATTC TTGCCCATTT TCTTTTGGGA TGTTGCCTTA TGTACATTAT TTTAAATAGA 
93 541 TAGCTCCATG TATTAAAAGA TTATTAAGTT TGAGGGCTTA TGATATGTCA GTTACATTTC 
93601 TAAGATTTTT TTTTTTTTTT TTTTTGAQAC GGAGTTTCAC ACTTGTTGCC CAGGCTGGAG 
93661 TGCAATGGTG CGATCTCGGC TCACCGCAAC CTCCGCCTCC AGGGTTCAAG CAATTCTCCT 
93 721 GCCTCAGCCT CCCCAGTAAT TGGGACTACT GGCAAGCGCC ACCACGCCTG GCTAATTTTG 
93781 TATTTTTATT AGAGATGAGG TTTCTCCATG TTGGTCAGAC TGGTCTCGAA CTGCCGACCT 
93841 CAGGTGATCC ACCCGCCTCG GCCTCCCAAA GTGCTGGGAT TACAGGTATG AGCCACTGGG 
93 901 CCCGGCCACA TTTCTAAATT CTTTATAAGT ATAAATTCAT TCAATCTTCA CCAAAACTCA 
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93 961 ATGAAGTGTG AGTACTATTA TTATCATTGT TTTACAGATC AAAACAAGTA ATACAGTCAC 

94 021 TTACTGAGTT CTATACACCT GGTAATTTTT TTGTTTCGTT GTTCTATCAA TTATTGGGGA 
94 081 AGGGGTGTTG AAATCTCTAC CTTTAAATCA TGTATGTGTC TATTTCTCCT TTCGGTTCTA 
94141 TCAGGTTTTG CTACACATAT TTTGCAGTTC TGTTATTTGG TGCATATACA TTTAGAATTG 
94201 CTTGTTTTTC GTATTGGATT GACCCTGTTA TCATTATGTA ATATCCCTGT CTGTTCCTAG 
94261 TAATTTTCTT TGCTCTGAAA TATACTTATC TGATATATCA TCCAAAAGAC CACCAGGATG 
94321 GCTAAAGAGT AGAAAGGAGA GATTTACTGG CAATACTAAT TTGCAAGCCA GGAAGAGATG 
94381 GTCCCAGAAC CTGCCAAAAT TACTCTCTCT TTGGGGAGAA GGAGCAGGTT GGTTATTTTT 
94441 ATGCCTCATA GGCTATATAT TACACAATAG AGTCATACAT ATTTAGCACG TTTGGGGGGA 
94 501 CAGCTATATA TATTATGAGG GGTGCCAAGT GCATTCACAA TGGATAAACA CGTGTAATAT 
94 561 ACCTCCCATG TTCACTTCGA GGTTAAATTT TGGTTAAAAT GAGGTAGAAT TTAGGTCTTT 
94 621 ACATCACAAG GTGAACTATA GGAACAAAGT TTACGTGCTG CCTCTAGCAG CTGGCTGAAA 
94 681 ATGGCTTAAG GTCTACAATT ACGTGTAAGA ATAGAATGTG TGTCAAGGCG GTCCTCTGTC 
94 741 CAATCAGAGT TGTAGTGGAC TGGACTGTAA ATCAGAGTTA GGAGGGCTTC TGATAGCTCC 
94 801 TATAGTTAAG GAATTTAGCA AGTGTGAGTT TTTTGGTAGT CTTTGGAATT TAGGAATTTG 
94861 CCATGCCAGC CAAGCCATGA ATGCTCTACC AGTAGGTAAC TTTGTTTGCT TAATCTTAGA 
94 921 GTCTGTCTTA GTTGGTATAG GGGCATCTAT TTTGGTCTTT CAGATCCCAG ATATTATTAA 
94 981 TACAGATACT CTTGCAGTTT TGGGCTGATG TTTATATGGC TTATCTTTTT TGCAGCCTTT 
95041 AATTTCAACC TGCGTTATGT TTATATTTGA AGTGAGATTC TTGCAGACAG TGTACAGTTG 
95101 TTGTTTTTTT TTTTTTGAGA TGGAATTTCA CTCTTGTTGT CCAGGCTGGG GTGCAGTGGC 
95161 ACAGTCTCAG CTCACTGCAA CCTCCGCCTC CTGGGTTCAA GGGATTCTCC TGCCTCAGCC 
95221 TCTTGAGCAG CTGGGATTGC AGCCATGCGC CACCACACCC GGCTAATTTT TGTATTTTTA 
95281 GTAGAGACAG GATTCACCAT GTTGCCCAGG CTGGTCTCGA ACTCCTGACC TCAAGTGATC 
95341 CGCCAGCCTC GGCCTACCAA AGTGCTGGGA TTACAGGTGT GAGACCTCGC GCCCAGCCAA 
954 01 ACTGTTTTTT TATGGGTGTA TTTATACCAC ACACATTTAA TGCAATTATT GATATCTTAG 
95461 GGCTTAAGTT CATGAAGGGT AGTGTGGGAA CCATAGTCTC TTGGCCCACT AAATCTTTGC 
95521 CAGAAATCAC TGACAAGGCA GATTGATTAA TAGGTGAAAA GGCATTTTAC CTATTGTTTA 
95581 ACGTGTCTAT GTGGGAGCAT TCAGAATTAA TTACCTAACT TCCCAATGAG TTATAGATGC 
95641 TTATATACCA TTTTTAGATC ACAGAAAGAA TTGGGGCTTA GATTCTGGTA AAACAGGTTA 
95701 TGGGAGGCAA AAGAGGTTTG GCTTGCAAAG GTGGCCTTGT TAGGTAGGTG AAGCCTCCCT 
95761 CAGAAAGAAC AGATGGTAAA TGTTTCTTTT ATGATTTTTA AGTGTCAGAC TCTCAGTCTC 
95821 TCCTGGATCT GGGGAAAGGT ATAGAAAGGT GAGGAGGCAT GGCTGCATTA ATGGAGATTC 
95881 TCTACAGATG TAAAATTTTT CCCATTTAAG GCAGCTTTGC AAGCCCATTT CTGCCTGCTG 
9 5941 GCCAAGCAGC AGCCATTTCA AAATATGTCA AAGAAATATA TTTTGGGGTA AAATATTTTG 
96001 ATTTCCTTTA GACTGGTGGC CTTATAAGAA AAGGAAGAGA CACCTGAGCT GACACACATA 
96061 CCCTTGCTCT CTCAACATGT TATGATGCAG TAAGAAGGCC CTCACCAGAT ACTAATTCCA 
96121 TGCCCTTAGC TTCCCAGGTT CTAGAACAGT AGGAAATAAA TTTCTTTTCT TTAAAAGTTA 
96181 GCCAGTCTGT GGTATTCTGT TATAGTATCA CAAAATGGAC TAAGTAACTA TATTATGATC 
96241 ATCTTACATG ACTGATCCCT CCTACATCAT ACACATACAC AGGCCACATT TGGAACATTG 
96301 TTAGAGGTTC CTCTGCCCAG TACAAATGTA CTACAAATTA TATATGTATT TTTAAATTTT 
96361 TGAGTATCTT CAATAGTATA TTTTCGTTAA CTTTTGTAGT CAAAATGTCA TTATAACATG 
96421 TATTCAATAT GCATAATTAT TAGTCAGATG TTTTACATTC TTTCTTCATA CTAAGTGATA 
96481 TGGTTTGGAT ATTTGTCCCC TCTAAATCTC ATGTTGAAAT GTAATCTCCA ATGTTGGAAG 
96541 TGAAGCCTGG TGAAAGGTTT TTGGATCGTG AGGGTGAACC CCTCATGAAG CGCACTCTTC 
96601 AGGGTAATCA ATGGGTTCTC ACTTTGAGTT CACAAGAGAT CTGGTTCTTT AAAAGAGTGT 
96661 GACACCTCCC CCATCTCTCT CGCTCAGCTC TCACCATATG ATATGCCTAC TCCCTCTTCA 
96721 CCTTCCACCA TGATTGGAAG TTTCCTGAGG ACTTGCCAGT AGCAGATGCC TGCACCACAC 
96781 . CTCCTGTACA GCCTGCACAA CCGTGAGCCA AAAAAAATTA CTTTTCTTTA TAAATTAGTC 
9 6 841 AGTTTCAGGG ATTCCCTTAT AGTAATGCAA GAACGAACTA ACACACTAAG TCTATTTCAT 
96901 ATTTACAGAA TAGCTCAATC TGAAGTACCC TTTTTCAACT TCACAGTAGC TACTTGTAGC 
96961 TAGTGGGCAC TGATTTGGAG CGTGTTCAAG GGTGAATTGT ATTATGCAAT TAACAGATTT 
97021 TTTTTATTGT TTTCGCAAAC CACGAGGCAT AGATTGTCTT ACTTTCTCTG CTCCTGGTGT 
97081 TGGAGTTGTT ATTGGGAAAC AACTTATTTT CCTCTTATAT TTATATGGAA TAAATAACCC 
97141 CCAATATTTC CCTCCCCAAT ATCTGCCTTT TGTATGTTTT TTGAAGGCAA GTGCCTAGAA 
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97201 TTTACTGTTT TTGAAGCACT TACTGAAAGG ATTGCCATCA AGTTGTTTTG CTAATAGTAC 

97261 ATGCCAGGCG CTTGTTGGTT TGCTTAATTC AAGGTAACTT GGATGAGAAG AAGAGTTTTT 

97321 CTCATCCATG GCTCAGTGGA GTATAGATTA CTGATATTGT GACTGGATGT ACTCCTGCTT 

97381 TCTAGTCTGA GTTTTTGAAG CTACCCTTAA TCTTGGTTTC AATTTTATCT AGCCCTGTAC 

97441 ATATCCAAGG CTCTTTCCAA AATGGTCTAC GATTTGTTTA GGAAGTTAGA ATAGCTGTAC 

97501 TTTCTGAACC ACGGTTCCTG ACATTTTCTG GACTTCAAAC ACATCCAGCA TTTTATCGAA 

97561 GTATTTATCC TTCCTACTTG GCTGGCTTCT TCCTTGCCTT CAGGTCTGAA TTCAAATGAC 

97621 ATTCTCCTGA TGAAACTTTC CATCCTTATT TCTATTCTTT TTTCTTATCC CCTTTCTTTA 

97681 TTTTTCTCCA CAGCACTCAT CACTTATCTC TACATTTTCA TTATGTATTT ACCTTATTGT 

97741 GCACCTCCCA CTACAAGACA AGTAGCACCG TAAGGAAACA GGTTGTCTGC TTTTTCACTG 

97801 CTATGCTCCC TGCACCTAGA ACACTCTCTG GCACTTAGCA GGTTTTCAGT AAATATATGC 

97861 TGAACTAATA ATGCTGGATA TACATCTCCC TCATGAACTC TCTAAATCCT TCTAATTTAC 

97921 ATTGATCAAT CTTCTTTTCC ATGTGCTTTT GTATGATTTA TTGCTCAAAA TCTTTATTTT 

97981 ATATGCAGAA CGTGCACTGC TATTTAATCT TCATGTACGT AAGTCCTCCC TTCTCTGAGT 

98041 ATAATCTCTT CAGGGCACTA TCTGAGATAA CTTTTTAACA TCTCCATCAT GAATCTTGTA 

98101 CCTTTTCAAA GAAAATGAGC CAGTGATTAC TGATGTTTAC GGCTATTGTT GAGGGTGAAG 

98161 ATCATTATAA TTTTGAAAAG GGAAGTTGAA TATTGTGAAG GGAAAGATAA CACTAGAGTC 

98221 AGAAGACTTG GGAGAAGGCA AAAAACAAAC TAAAAATGAG CACTTTTAGT CTCCTGACAG 

982 81 TTTCTCTGAA TCAAATCCAT AGTTCTGTGA CAGCGTTGGC TTAGAAGCAG ATTTTTTTTT 

98341 TTTTTTTTTT TGAAATGGAG TTTCGCTCTT GCCCAGGCTG GAGTGCAGTG GCACGATCTC 

98401 GGCTCACTGC AACCTCTGTC TCCAGGGTTC AAGCGATTCT CCTGCTTCAG CCTATGGAGT 

98461 AGCTGGGATT ACAGGCTCCC ACAACCACGC CCAGCTAATT TTTTGTATTT TTAGTGAAGA 

98521 CTGGGGTTTC ACCATGTTGG CCAGGCTGGT TACGAACTCC TGTTCTCAAG TGATCTGCCC 

98581 GCCTTGGCCT CCCAAAGTGT TGGGATTACA GGCATCAGCC ACCGTGCCCA GCCAGGAGCA 

98641 GATTTTTTTA CACTCATGTT TCTTTTTCCT TCTGTCATCC TGTTTCAGTA TAAGCAGACC 

98701 ACAGATAGAA GTAGTAGATA CCTCAGAAAT TCCTGGAATA ATTAATCCAC GTTCATCTGT 

93761 ACTCCATCTG CTCCTATCTC ATGGAATATA AAAGGAAAAA CACCAAGATT TCCCTAGGCA 

98821 ATCTGTCTTG ATTTTAGGTT CCTCAACAGG AGAGCCAGAC AATGGCTGTA ATAATATTGT 

98881 CCCGGCCAAG GAAAAACTTC CCCTTTGCCC TCCCAAGGTT TATGGAAAAT TACTGGCAAA 

98941 ACACAGATTA ACTGGAGAAA AGGCATATAT ATTTATTTCA TCACAATTTT ACAGGAQATT 

99001 TTAGAATTAA GACTGAAAGA TACAGGGGAA ATTGCCCATT TTTATGCTTA GGTTCAACAA 

99061 GATAAACAGC TGTATAGGGT ACGATCTAAT GCTAACAGAC TGAGTGGGGA AGCCCCGCAA 

99121 GGCTTGTCTG TCAAGATTCT TCTTGACCTC TCAGTGCAGC ATTTCTTCCT TCTGGTTATA 

99181 GGACAAGACT CTCTTTTAGA ATGGGGGGTC TTATGACCTA CAGGCAAACA AGGTAGGTTA 

99241 GAGTAATACT TTTAGGTTTT ATGGCTGGTT CTAGGGAAAA GGAGTTCTGG TTTGTATGGC 

99301 CTACCTTGAG GAGGAATTCT GGTTTCTATG GCTAGACTTT GGGGAGAATG GGACTTACAG 

99361 ACAGGAAGGC AGAAGGTGGT CAGTGAAACA CTTTTATAAT CATAATCCCA TTTTGAGTAT 

99421 TTCTGTGTTA TGGAATGTTT GTTCTCTCAT TTCCTGAAAG ATTCCAGAGA CTCCTCATTC 

99481 AGTGTTGTGA AAAAGTTCAG GAAATGCAAC TCAAAAATGT GCCACTTTGT TACGCTGATT 

99541 TCTTTGAACT GAGGGCACCT AGGAAACAGT AAATTCAAGG AAGGGCTTTC GCTGAACTCT 

99601 AATCAAAAAT TTGAAAATTA AAAAAAAATT CAAAAAGGAA TTTAGTTGTT AAGATTCACT 

99661 TCCCTGGGGA ATCTCATCAA CCAGAGAAGA TTAACTQTAT CACAGGAGAG GAGACTGGTG 

99721 GTTAACACCA TCTAAACAGA CTTTGTCACA GCTGTCACCT ATTCTTTGAA ACACCCATTT 

99781 ATTTTTCTCC AAAATCATAT ACTCTCCCCT AAGTTGCCTA CATCCCCCTT CTTTCTCCCT 

99841 TATGAATCAA GAGAGCTTAT AAGCTTCTAC AGTTCACTGG GATTTGGGGT ATTCGCTTTT 

99901 CTTCCCTCCC ACTCCCCCTC CCCTTTTTTT GTCTTTGAGA CACAGTCTTC TGGCTCTGTC 

99961 GCCCAGGCTG GAGTGTGGTG GCTCTATGTG AACTCACTGC AACCTCCTCC TCTCGGGTTC 

100021 AAGCGATCCT CCCACCTCAG CTTCTCGAGT AACTGGAACT ACAGGCGTGC ACTACCAAGC 

100081 CCGGCTTTTT TTTTTCTTTT TCTCCCCCGT TTCT T TT l ' T Q GTTATTTTAC TGGAGACAGG 

100141 GTTTCTCCAT GTTGTCCACG CTGGTCTCGA ACGCCTGACC CGCCGTCCTC GGCCTCCCAA 

100201 AGTGCTGGTA TTACGGGCAT GAGCCACTGC GCCCGATTTG AAGGACCTCT TAAATATCTA 

100261 TTTAGAAATT GGTCGGAGTC CACTCCTTTC CAAAAACATG AGTCACAATC CGGGAAAAGC 

100321 ACGAGCGGCT GAAAGTCAAA ATAACCAGAA CAAAACCTCC ACTCATGCTT AAAAAAGGTA 

1003 81 TTTTGACAAA ATCCTAATTC GGCCAATTAT TATTAGTATT CAAGTCGAAG GCTCGTCAAG 
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100441 CCAGACTGGG GATTGGGTCA AACATAAACC TTACACCAGA CGGAAGGATT ACATGCAAAT 

100501 GAAGGATGCA GATTCTGATT TCCCATTGGG TATTTGACAT TAGCCAATGG GAGAATTCCT 

100561 CACAGCCTAC CTCCAGTCAG TATAAATACT TCTCTGCCTT GCGTTCTAAT GTAGTTTCAT 

100621 TACATTTTCT TGTGGCGATT TTCCCTTATC AGAAGTAGTT ATGTCTGGTC GCGGCAAACA 

100681 AGGCGGTAAA GCTCGCGCCA AGGCTAAGAC TCGGTCTTCT CGTGCAGGTT TGCAGTTTCC 

100741 TGTGGGCCGA GTGCACCGCC TGCTCCGCAA AGGCAACTAC TCCGAGCGCG TCGGGGCTGG 

100801 CGCGCCGGTG TATCTCGCGG CGGTGCTTGA GTACCTGACC GCCGAGATCC TGGAGCTGGC 

100861 GGGCAATGCG GCCCGCGACA ACAAGAAGAC CCGCATCATC CCGCGCCACC TGCAATTGGC 

100921 CATCCGCAAT GACGAGGAGC TTAATAAACT CTTGGGGCGT GTGACCATCG CGCAGGGTGG 

100981 CGTTTTGCCT AATATTCAGG CGGTGCTGCT GCCTAAGAAA ACTGAGAGCC ATCATAAGGC 

101041 CAAGGGAAAG TGAAGAGTTA ACGCTTCATG CACTGCTGTT TTTCTGTCAG CAGACAAAAT 

101101 CAGCCTAACA GCAAAGGCTC TTTTCAGAGC CACCTACGAC TTCCATTAAA TGAGCTGTTG 

101161 TGCTTTGGAT TATGCCGCCC ATAAAGATGT TTTTGAGGTG TTTTTAATGG CTTTGAGTGT 

101221 GGCACTTTTA GTAATTTGTC CTGCAGAAAT TAGATCCATA GAAACCTCAG GAATTCTAGG 

101281 TATGTGGGAG AAGTGCCATG CAGCACAAAA CATGTTTACA GGGGTGATTC GCGTTAAGTT 

101341 TCACACACAG CAQTTACTAC ATTTTAGAGG AAGGAAATTA TACCCATGAG TGCATTCCTA 

101401 ACTATCTTGA ATGGAAGTGT TAAAACCCGC ATGCCCCACA CAAGTTTGAA TATGTCATAC 

101461 CATTTGCTGT AGCAATTAAT GGCATACACA ATTGAGAGCA CACACATTAC CACTGAACAT 

101521 TTGAGTATGT ATTTCCCAAA ATGAGCTTTT TTCCAGTTTG GGGATGTTTT GCTTTGTTTT 

101581 GGGGTGGAGT CTCCCTCTCG CCCAAGCTGC AGTGCAGCGG CGTGATAACA GCTCACTGTA 

101641 ACCTCGAACT CGGGCTCAAG CGATCCTCTT GACAGCCTTC TGAGTAGCTG GGATTACAGG 

101701 CGAGAGCCGC CACGCCCGGC TAAGAGCATT TTTCTAATTG CCCACACTTC TTATGCGACA 

101761 CCCAGAAAAA TACAATTTTA AATAAAGCGC ATATGCAAAT TTCCCTAATC GTCTCCAATA 

101821 TTCTCTGATT TCTTTTTTAT ATTTTAACTA GAAACAATTG GAGGTTTCCG CGTTGCTTTG 

101881 TGTGGTTGTA AATTTTAAGA CTTCAGGAAA CTTTTCCAGT ACAAGACTTG TCCACAGTGG 

101941 ATATAGCAGC TAAGGGGTTA ACAAAATGAC GTCAGAGTAG CTACGGTAAT GGGCAGGAGC 

102001 CTCTCTTAAT CTGCAACCAG GCACAGAGAT GGACCAATCC AAGAAGGGCG CGGGGATTTT 

102061 TGAATTTTCT TGGGTCCAAT AGTTGGTGGT CTGACTCTAT AAAAGAAGAG TAGCTCTTTC 

102121 CTTTCCTCCA CAGACGTCTC TGCAGGCAAG CTTTTCTGTG GTTTTGCCAT GGCTCGTACT 

102181 AAACAGACAG CTCGGAAATC CACCGGCGGT AAAGCGCCAC GCAAGCAGCT GGCTACCAAG 

102241 GCTGCTCGCA AGAGCGCGCC GGCTACCGGC GGCGTGAAAA AGCCTCACCG TTACCGCCCG 

102301 GGCACTGTGG CTCTGCGCGA GATCCGCCGC TACCAAAAGT CGACCGAGTT GCTGATTCGG 

102361 AAGCTGCCGT TCCAGCGCCT GGTGCGAGAA ATCGCCCAAG ACTTCAAGAC CGATCTTCGC 

102421 TTCCAGAGCT CTGCGGTGAT GGCGCTGCAG GAGGCTTGTG AGGCCTACTT GGTAGGGCTC 

102481 TTTGAGGACA CAAACCTTTG CGCCATCCAT GCT7JVGCGAG TGACTATTAT GCCCAAAGAC 

102541 ATCCAGCTCG CTCGCCGCAT TCGCGGAGAA AGAGCGTAAA TGTAAAGTCA CTTTTTCATC 

102601 AGTCTTAAAA CCCAAAGGCT CTTTTCAGAG CCACCCACTT ATTCCAACGA AAGTAGCTGT 

102661 GATAATTTTT TGTTGTCTTA ACAGAACAAA TTTCTAAGGA CCCCCCCGGA AAGCATTAGA 

102721 CTATGGTCTT AAAGTTGATT AACAGAAATA ACGGTTTGGT CAGTCTTGCA GTGTAGGTTA 

102781 TTTCTGACCT TATTAAGGTG CTATTTGGAG AGAAGCTGTG TAAGTCCACT ATCATTCAGG 

102841 CCTCTAGCTT GCTATGATTA GCATTTGTTT AAACAACTTT GTAAGAGTAA GGGAAAAATC 

102901 TGGTAAGTAG TTAACTGGCG CTTACTAGGC ATTTTTGCAA AGCTTTGAAA AGATTAGAAA 

102961 ATTGTGTCTT GCGAGTTCCA GTGTCTTCCT CAAAATGCTT AGGAAGATTT TCTCAGCTCA 

103021 ATACATAGTC CCCTAGGTTT TCTCATATAT TATATATATA TATATATATA TATATACTGT 

103081 TAAATTCATT TGGCTGTTAA CATTAACCTG AAATTTATTC TGGTGCAAAA TGTGAGGCAG 

103141 GGATCTAACT GGCTCTCATT TTATCCATAG CTAGCTACCC ACTTTAAATC TGTCAGTCTG 

103201 TCGACCAAGC ATAATTTAAT CCCTTATATA TGAATTTTTA TATGTGTGGC TTTGCTTGTA 

103261 AATAGTCTAT CTGGTTGCAT TGCTTTGTCT CCTCTAGGAC TATGCACCAT GACATGCCAC 

103321 ATTCTTTTTT TCAGTACTTC TTGCCTGTAG TTATTAAAAT CTAGAATTTA CAAGTTTTAA 

103381 CCATTTTCTT TCTGTTGATC TTGCTTTTCG GTTTTGGAGG TTGGGGATTG AGTACTGGAA 

103441 GAAAATTTAQ AGGGATGGGA ATACTGTACG CAAACAAAAG TAATATTTAC TTTAAAATTT 

103501 TTATATTTTG TATTTTTTTA TCATATAGCT TTTACATCAC ATTTTACAGA CTAACTTTAG 

103561 AACAACCACA GAATGTCCAA CATTAAAACT ACTAATTCCA AAGACCTTGC CTCACATTCT 

103621 TTTTTACAAT AAATATTTTT TACACCTAAC ATTCTTTCTT GGCCTACATC TAGAATGTAA 
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103681 ACTGATGTAC CATACTAAAA TCGCCTGACC AACTGTCAAC AACAACAAAT CACACACACA 

103741 AAAGATTAAA TTTGAATTGC ATCGTTTACT TAAATTCATT TGTGTTCCAG CTTTTAATAA 

103801 GGCAGTTTTT GGTTTATAAA GTAATATTTG CATTTTAAAA ATTATGAAAA TGAATATGTC 

103861 AGTTTGTTTT ATGATTCGTT TTTCTTGACT CTTATACAAG CGACTCTAAC TGGCATAGAC 

103921 ATTTGTTATC CACAGACAGT ATAGATATGT TAGAGATGCC AATGGACTTG GTCTATGCCA 

103981 AGGTGACTAC TCACAAGCTC TGGGCCCAGC TGAAGGTCAA GTATTTTTTT TCCAGTTATA 

104 041 GATGTGCTGG ATCTGATGTA TAGCGCTTGA CTTTTTATAT TTTCTTTATC TGTAGGAAAC 

104101 AAATGTGTTG GAGGTACTGG GTCTGACGAA TAGCATAAAA GAATAAAGTT ACATTACTGT 

104161 CTGAGGATCA GATGGACAGG GGGTGGTAGC TCAGTCCAGC TATTTTCCAC TCCCTCACTT 

104221 ACATTCTTTG CCCCCTCCTC AACAGAACAA GGATTCTGCT GTAACTCTTC ATTGACAGTT 

104281 GATATTTAAA AATTAACGAA TGGATGAAAT TCTCATTTGT GAAAGAAAAT TTATTGAGCA 

104341 TTTTGTATTT GTGAGTAGTG CAAACATTTT AATATTATAT TAAGAATCTA TTGTTTTGTA 

104401 TTAGAGGAGT AATTAAGGAG AGATTGGAGA CAAAAAGGGG GTGTTGTTTG CAGAATATAC 

1044 61 CATCCAAAAA TAGACCACTG TGGGATCAGG ATTCTTTTGA GCTAAAGGCA CTTCAAAAAC 

104521 AGCATTCAAG AAGGGAATTC TTCTAAACTT TTCTTTCTGA AAACAGGAGA TAAAAGTTCC 

104581 AATGTGAAAA ATGCTCTGCT TGTACCAGGT GAAAAGACAT ATTCTTCAGC CCAGAGGCAT 

104641 AGATGAGATA ATTCTGCACA AACACAGCAG GGAGTCATAG CCGAGAGACT TCTATACACA 

104701 AACAAACCTT GTTAAAATAA TCATATATTC CTTTAATCTC CTCATATGGT TTACTTTCCC 

104761 ACAATTGCCT CTCTTTAACT TAATGTGAAA GCATTTAGCT TTTGCCZATTT CTTTGGGGCT 

104821 TCACTTTTTT ATGAGGGTTC TCCTGTCCCA TAAAATTTAC ATTAAATACA TTTGTATGCT 

104881 TTCATTCTGC TAATCTGTTT TATGGCAAAT GAATTATCAG GTCCAGCTGG AGACCCTAAC 

104941 AGAGTAGAGG TAAAATTTTG CCTCCCTACA AGATAGAGAT TGTGTGCATT AAATGTTGTT 

105001 TGTTCCCAGT TGTTCAGTTT GTCAGGCCTC TGAGCCGAAG CTAAGCCATC ATATCCCCTG 

105061 TGAACTGCAC GTATGCCTCT AGATGGCCTG AAGTAACTGA AGAAACACAA AAGAAGTGAA 

105121 AATGCCCTGT TCCTGCCTTA ACTGATGACA TTACCTTGTG AAATTCCTTC TCCTGGCTCA 

105181 TCCTGACTCA AAAGCTCCCC CACTGAGCAC CTTGTGACCC CCACCCCTGC CAGCCAGAGA 

105241 ACAACCCCCT TTGACTGTAA TTTTCCACTA TCTACCCAAA TCTTATAAAA CGGACCCACC 

105301 CCATCTCCCT TCGCTGACTC TTTTCGGACT CAGCCCGCCT GCACCCAGGT AGAATAAACA 

105361 GCCTTGTTGC TCACACAAAC CCTGTTTGAT GGTCTCTTCA CACGGACGCG CCTGAAACAG 

105421 TTTAACAGGG TTTTTCCTGC CCAGTCACAA CAAAGTGATG TTATGCTGCA GGCTGAAGTT 

1054 81 TACAGCTAAT GCTGTTGAAG TCTAAAATCA GTTTTGGTTT GTTAGATTTG GGTGAGATGG 

105541 CTAAGATTCT CAGAGAAAGA AGTCAAGTTT GGGGTGCATT TTTCAGACTT AAAAATTTAG 

1056 01 CAGTAGCCCT TGCAGTTTTT CCAATAGAAG TGATTTAAGA ATGTTTTCAG GAAATTTAAA 

105561 ACAACAGTGA GAAGCGTGTA TGGAGAGTTG AACTACACTC CAGACTTGGC TATAGGAAAG 

10 5721 CACGAATGCT GCTATTGTAT TGCACCTTGG AAAAGAGAAC AAAGGAATAT TTTCGGACAA 

105781 TTTTAACATG TCACATATGA AAAGCTAAAC GGAATCTGTC AACACCTTGT ACGTTATTAC 

1058 41 AGGCTGTGAT TTTAAAAAAA CAATCCTTAC TAATACATAC ATAGTTGCTG CTAGCAATAT 

105901 AGTGTTGGGA GTAAAAACAC GAAAATGAGA GTTCAGGACA ATATCCCAAC TCTGAGCAGA 

105961 TTTTTTTAAG TAGTAACATC TAAAATTAAA CCATATTATG TAATATTTAT TTCTTTTCCA 

106021 CAGTCTCTTC TCATGCCTCG TTCACATTAG CTAATTAAAA GTCCCCTGAG TATCATCATA 

106081 ACCCGATTTA CAGATGAAGG CACGGTTGCA ATGAGCTATC ACCCTCTTCT GAATGAGACA 

106141 GTACAGTGTG AAGGATAGCA AAACTCCACT CCCATCCTCT TAGGGCTCTG GCTGGACCAG 

106201 CAAATTAAAT TAATGTAAAA TGGATTAACA GGAGAAAGGT ATATGCATTT ATTTAACACA 

106261 GGTTTTACGT GACACAGGTG CTCTCATAAG GTAATGAAAG CCCAAAAAAA GCAGTTAGCT 

106321 ACTTATATAA TGAATTGGAC AATTAGTAAA ATGTAAAAAT GCGCTAAAGC AAAGGGATTT 

106381 AGGCTAGAAT ATATAACTGT GTAGAGAAGC GCCCAGCAAG GGCTAGTGCA AGGTTTGTAC 

106441 AGAATTCTCT TGGCCTCAGC CTCCTATCCT TGAGAAGAAT GTTGCTTTTT TTAAACTACA 

106501 GTGAGAACAT CTTTCATATG AGAATTTCAC CTACTGCTTC TAAGAAACAG GTCAGCTTTC 

106561 AAGAAAACAT AAGGCCAGAG TGATCTTTTC ACGCCTGCTC TTTTAAGTAC CTTTGAATAG 

106621 TGAATATGTC TTCAAGCACT TGAAAGACTT AAAAAGTTTA CCACTCCGGC ATATTAGTGA 

106681 AAGCCCTTAA TATAAGCCCT TATTAAAATT CTCAGTCGAG GGTATAAATT CAGATTCAAA 

106741 TAGTAGTGTC GTAAACGGGA GGGAAAAACT AAAGGGATTA AAAAGTGAAA CTATTGTGTT 

106801 CTCCCTCGCA GTCCTTAGGT CACTGCCCCT CGAGGGGCGG AGCAAAAAGT GAGGCAGCAA 

106861 CGCCTCCTTA TCCTCGCTCC CGCTTTCAGT TCTCAATAAG GTCCGATGTT CGTGTATAAA 
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TGCTCGTGGC TTGCTTTCTT TTCGCGTACC TGGTTTTTGT TGTCAGCTGG TTAGACATGT 
CTGGTCGCGG CAAAGGCGGT AAAGGTTTGG GTAAGGGAGG TGCCAAGCGT CACCGAAAAG 
TGCTGCGGGA TAACATCCAA GGCATCACCA AACCGGCCAT TCGGCGCCTT GCTAGGCGTG 
GTGGGGTTAA GCGAATTTCC GGTTTGATTT ATGAGGAGAC TCGTGGCGTT CTCAAGGTGT 
TTCTGGAGAA CGTGATCCGG GACGCCGTGA CCTACACGGA GCACGCCAAG CGCAAGACTG 
TCACTGCCAT GGATGTGGTT TACGCGCTCA AGCGTCAAGG ACGCACTCTG TACGGCTTCG 
GCGGTTAATC TTTTCGTCAG TTTTCTTCCA ATGGCCCTTT TCAGGGCCGC CCACTCCCTC 
TCAGAAAGAG CTGTGATTGT ATTCTTTCGG ATGGTAACAT CTCAATGGCT TTACTCGGCT 
ATTCTGCCTA GTATGTAGAA CTATTATAAA CCAGTTGGGA GAGACCAGGT TGTTTGGTCT 
GAGTGGCTGC TAAAGCAGAA ATCAGCTAAG TAAACGAGGT CTCCGAGATA AGTGAGCTAT 
AAACTTCAAT GCTATAGTTT TGACATGTCA AGCAACTTAA CGTGCAGCGC GAGTCCGATA 
AATGAGTAGC TCAGCTTTTT AGTTTTAAAA ACGAGTTGTG CGTTATTTGT ACGAGAGCCT 
AAGATGCTAG CTGCCTGGAA CTGAGTAGGT GGATTAAAAT GGGTGTCAGG TCTGTTTTCC 
CAGGCGTATC TGACTTAACG TCAGCAAAAG CTGTACTTTT AGCTTCCCTG GTAACACCTG 
CCGTCCTTAA CCGCCCCCTG CCGGTAGCGC CAGAAGCCTT TACTTCCATT TCTAGTTGAG 
CTTGGCGTCC TGCTGAGTGA CGTCACCTCC CCCTTCTCTG GAGTAGGACT GGCGGTTAAA 
GCTGCTTTGC TATTTTCAGT CCTCAGGCTG GAGGCTCCCC TAAGCAGGCT GCCTACGCAG 
TTCGTAAATT CCCACTTAGT AGACTAAGGG AGTCTGTTTT ATAAATAAGG ACTCAAATTT 
CTTCTGACTC CGAGGTCCGT GGCAGCAGCT ATAAGATGGA AGCCCCCTCT GATGTAAGAT 
TCTCAGATGA CTTGCATCTT CACTGTACCT GTCAACCCAA TAGTCTTCTA TTCCTGCCTT 
AAATTGTAAA TTCCAAAACT GATTTAATTG TGAAAGTTTC AAACTGTACG ACCTAGGAAG 
TGTCAAAGTT AGGTGACCAG ATTTTTAGAA GTCAGCCAAA TATTCAGCAT CTTTGATTTA 
GTAACAAATA TATTGATGGC TACTTCAGCA AAAAAAATCA ACTTTGTTTT CTGGTTACTT 
TGCTAACAAG CTTCTCCTGA CAGGAGGATA TAGTGAATAG GCAGTTGAAT AAGTGAGTTC 
GGGTGAGAGG TCTGAGCTGG AGATAAAAAT GTGTGAGTCA TCAGCAGATA AATAAATGCT 
GAGACCAGAT GAGATGGCTA AAAACTGAAA CATAATGTAG TGCAGCATTG TTTGTAATAG 
TAAATGAGTG GCAACTGTAA AGTTTTCATC AGAAAGGACT AGAGTGATCT ATACATCCAT 
AAAATAGAGT ATTTCTCTAC ACAGCCCTAC TAAAGAATGA GAAAGCTGTA CTCCACTACA 
TACTCTGGTG TACTCTGGCT 'CAGTTCTTGG ACTCCTCTTT TCTTGGCTAA CTCAACTGGC 
CTCACCACTT ACATGCTCTG TGCTCTGTCA AATAGTTTGT TCAACAGAAC ACCACGGCCT 
AGCTGTAAGT GCCACGTTAA CTTCTAGCAA TGCCAAAGCC TGTGATAGTG GCAGCTTCGG 
GCTGTTTCTC ATTCCCGGGA TGCCTAACCA CCTCTCCAAA TTCTATCAGT TTGCTTCCAC 
CCACTTCAAG Cjr(^Ar,AAca AAA^rATAnaa rTTaanaaaT nThnnrrrnn m^nnr^TfinrT 



c — 



108901 
108961 
109021 
109081 
109141 
109201 
109261 
109321 
109381 
109441 
109501 



CACGCCTGTA 
TTCGAGACCA 
TAGCTGGGCA 
ATAGCTTGAA 
CCTGGGAGAC 
CCTGACCTTA 
TGGGCATCTC 
TGATATTTGT 
TGCAATATTG 
AAATCAATCA 
GGCGGTTTGT 



ATCCCGGCAC 
GCCTGGCCAA 
TGGTTGCGGG 
CTCGGGAGGC 
AAGAGTGAAA 
AATCTCTAGA 
GAACTTGTCC 
CATGTCAATC 
TTTGTCCTGA 
GAATACCTTT 
CTGAATGACC 



TTTGGAAAGC 
TATTGTGAAA 
CGACTGTAAT 
AGAAGTTGCA 
CTGTGTCTCT 
CTCATATACA 
AAAATATGTT 
AATAGAACTC 
GCTTCTTACA 
CATTGTTTTC 
ACAGTGACCC 



TGAGCCTGGT 
CCCCGTCTCT 
CCAAGCTACT 
GTGAGTTGAG 
AAATAAGTGT 
ACTGCATATT 
TATACGTAAA 
CATTCTTCAA 
ACTTTCACCC 
TTTGCTGCTT 
CAAACTGCTC? 



GGATCACCTG 
ACTAAAAAAA 
CGGGAGGGTG 
ATCGCGCTAT 
TTGCAATTAT 
TGATGTATCT 
CACCAAGTCT 
GCAGCTTGGG 
AATGCAGTCA 
CTCTAGGAGC 
TTTGTTTTP & 



GGGTCAGGGG 
AAAAAAAAAT 
AGACAGGAGA 
TACACTTAGG 
AAACCATCTC 
AATTGAATAA 
GTTCTTCCTC 
CCAGGAATTG 
GCTCTGTTGA 
AAGCTGCCAT 



109561 CCCTGTCATA CAGTTTTTTC TCTATCCAGC ATCAACAGTG ATCCTTTTTG AAGGTATTAT 

109621 GTCCACTGTC TGCTGAAAAG ATTCCACTGG CTTTCCATCA CCTTCATAAT AAAAACCAGC 

109681 ATCCTTATCA TAGCCTACAA GTAAGATGAC CAACCATTAC AGTTTGCCTG ACTCTCAGGG 

10 9741 GTTTCTCAGG GTGTAAGACT TACAGTGCTG AAACTTAGAA AGTTCCAAGC AAACTAGGAT 

109801 GAGCTGCTCA ACCTACTAGA TCTGTACTCT GGCTACCCTC TGACCTCATT CTCTTCGCAG 

109861 TTCTTTCTCT TCACTGACCT TGCTGTTTCT GGAATGGACC AAGCATTTCC AGCATCAGCA 

10 9921 CCTTTATATC TATTCTTTCT CCCTAGAAGG GTCTTGTCCT GGATATCTGA ATGGCTCTAG 

109981 ATCTCATTTC ATTCAAGCCT CTCCTCAAAT ACCAACCTTA CGAAAGAGAC CTCCCATAAT 

110041 CATCCCTTGT AAAATAAGCT TTTCTGCTCA TTTAGCATAT ATATATATAG TTGACTATCC 

110101 TCAATAGCAT ATATATATAA CATTTCCCCA CCTAGAATTA TATATGTAAT AATATATTTA 
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110161 ACAAAAAATA CATATAACTA GATATATTTT ATTTTGTGTT TGTTCTCTCT CCCCCAACTG 

110221 GAATATATTT TTTGAAGGTA GGGACTTTGT TTTGTCCCAG AAGTATCCCT AGCACCTTGA 

11 02 81 ACAGGGCTGA CGTTTAACAG GTAGTTTATG GAGGTTTGTT GAATGAAAGG ATGTGTGAAT 

110341 TTTCTATGTA AGTCTCCAGG CTCTCCACTA AGCCCACCAG AATGCTAACA CAATCAATTC 

110401 CCCATCTCAT TCCTTGACCT GCCACTGCCT GAAGCAATCA GCGTGCAGTT TCTCTTTAGA 

110461 AAATCTGGGG GATAGTCTAG GGGTTGCAAA TTAAGCAACA TTATCTTTGT TCTGAACAAG 

110521 GACTGCATGA GTGTTAGGAC TGAAGAAGGC CCAAGGTGGT GGTGGGTATG CCTAAGATGA 

110581 GTATGACATA TCAGCAATGC TATGAACATA GCAATGCTAT GAAAGGCCAG GCAAAACGTA 

110641 ACAGGAGCTA GTCGTGGCTT ATTGTTACAA CGACTATACC TCCCATATGG GTAATCGATA 

110701 TCCACACACC CCTCTACATT GACTCTGGAA TTCAGGAAAG GGAATTAAAA TTTTCTAACT 

110761 TATGTACCCC AATGATTTCA ACAATATCTG GCATATGAGA TCAATAAATA TCTTTAAAAT 

110821 ACCAACTAAG AAAGACATAA AATGACCCAC CCTCCATACC AGGCTCATTT TTGCTCCTCT 

110881 GATTCCTGAA ACTATCCAGA ATGCAGCTAT GAATTCTCTC CATTGTCAGT TTTAAATTAA 

110941 GCCAAGCTGG GTACTTGTGT AATTCCTCAA GAAATCCTGG ATGAAAACTG TCAGGTGGAA 

111001 AACAGGACCT CAAAATAAAG AGACATCCAT CACTGAAGCT AACATCGTGA GGCTGAAATC 

111061 AGTCCTATAA CAATGGTACC AAAAAGAGCA CAATGAGAGG CATTTGTGAA TATTTACTCA 

111121 GATGAGAGTA AGATATTTCC CTATCAGCTA ACCTGAAGTT CACATCCCTT TTCCAGCTGA 

111181 GTTCTGAAGC TAGATGTACT TAACTGGAAC ACATAACTGC ATCAGGAACA TCCTTTAAAA 

111241 CTATGGCTAC CATGGCTTGA CTGGACAAAC CCCAGGCTTC CAGGTTTAGC ACAGGTGGCC 

111301 CTTCACAGAC CAACATTGCC TATGCTACCA ACCTCATGTC CTACCACCCT GCTTGCATCA 

111361 TTTCTCTCTC TGCATATATA AAAATATATG TGTATGTATA TAATCAGCTT TATTGATATT 

111421 TAATGTACCA CAAAATTTGC CCACTTTAGG TACAGTTCAA TGAATTTTAC CGTGTTTTCT 

111481 TAGTTGTACA ACCATCATCA CAATTTAATT TCGGAATATT TCTATCACCC AAATTTCCAT 

111541 TTCTGCGTAA AGGGGGAAAA AAAAAGGTTA ACTGCTGAAG GCCGCGGTAA CACTGAAAAA 

111601 GGTGCCTTTT CTCTCTAAAA CAGATTTTAA TCTCCCCTGA ATTTAGTGTC CTGGGTATTC 

111661 CAGGAGTCTG AATAGGGTTT CAATTTTCAG GGTCTTTTTA ATAGAGTAAA ACTGTATTGG 

111721 TGGCGATAAA TTTAGTATTG CTCTCAGTAC ATGATTGAGG GATACTTAAA TGTCTCTGTG 

111781 ATTTTATTTC ATAATCGCTA AAAGATGGTT TTTTTTTTTC CTAAAACAGG GTTTTTGTTT 

111841 TTTCTCAATA AGCTTCTTAG CTTCCCCTCC GGCTCCCTGG CTTGCCTCAG GAAATATTAG 

111901 CTCATCAGTT CTGATTGGTT GACAGCTACG AATGGCCCTC ATTGATTGGG CAGCGCTTCT 

111961 TTGTCCCTTG GAAACTAATA CAAATTTTTA ACACTACTTT TTTTCCACTC TTTCTTCAGA 

112021 GTTGGAATAT CGTTGCTCCC CTACCCATAT GTAGTGAGTG GAGGGCAAAC TTGGAGTTCC 

112081 CCTAATCTTT CCTTTTTAGG ATGTCAGCTC AGTATCATTC ATCTTAATTA CACATTGAGC 

112141 TTCTTGACTT AATGGATACA GCTCTTCTTT TGTTTAGTTG GGCGGCCCTG AAAAGGGCCT 

112201 TTGGTTCAGA AATGCAAGCT GTGGAGAAAT CAGCAACCTT AACCGCCAAA GCCATAAAGG 

112261 GTGCGTCCCT GGCGCTTAAG CGCGTAGACC ACGTCCATGG CAGTGACTGT CTTGCGCTTG 

112 321 GCGTGCTCCG TATAGGTGAC AGCGTCACGG ATCACGTTCT CCAAAAACAC CTTGAGCACC 

112381 CCGCGAGTCT CCTCGTAGAT CAGACCAGAG ATCCGCTTCA CACCGCCACG CCGGGCCAGA 

112441 CGCCGGATGG CCGGCTTGGT GATGCCCTGG ATGTTGTCAC GCAACACCTT GCGGTGGCGC 

112501 TTGGCACCCC CCTTACCCAA ACCCTTCCCG CCCTTACCAC GTCCAGACAT GACTTCCCAA 

112561 GAAGTGAACC AAGAGCAAGT GAGAGAATAG GAAACCGATC TTTATATATC TACGTTACCC 

112621 CTGCCCCCAC CTCCAGCGGA CACTGAGACT GAAAAGCGCG CAGGCGGGAA ATGTGACGCC 

112681 TACAGTCCGC TCCTTTAACC CCTCCTCCAA GCCCCAGGAA ATGGCGGGAG CAGCGATTGG 

112741 GGGAGGGTGG GGAGATGAGG 6TGGGACCAA GCAGGCTTGA CCAATGGCCT TTATTTTCTT 

112801 AACAGAGCTA CAGGCTTTGA GGAACTGGGT TAAGAATTAA ATGTAAACCC ATTCTGACTC 

112861 CAGAATTATT TTAAGTCGAA CTTTTTTTTT AACCGAATCT CTCTGTCGCC CAGACTGGAG 

112921 TACATTAGAQ CCATCTCGAT TCACTGAAAC CTCTGCCTCT CAGGTTCAAG TGTTTCTCCT 

112981 GCCTCAGCCT TCAGAGTGTA GCTGGGATTA CAAGCGCTCG CCGTCGCGCC CGGCGTGTTT 

113041 TTGTATTTTT CGTAGAGACG GGATTCGGCC ATGTTGGCCA GGCTGATCCC GAACTCCTGA 

113101 TTTCTGGTAA TCCGCCCGCC TCAGCCTCTC AAAGTGCTTG AATTACAGGC GTGAGTCACC 

113161 GCGACCGGCC GAAATCGATT GGTTTTGAAG CCTTCAGTAG CATTAAAACG AAAAGTGCTC 

113221 CCAATGCATT CCCTTTTGTC TTAAATTGGT TTCTTACAGC TACTTTACTT GAAAAGGTGG 

113281 TGGCTCTGAA AAGAGCCTTT GCTTGGACCG TCAGAGAGAC CACAGTAATC ACGCCCTCTC 

113341 TCCGCGGATG CGGCGGGCGA GCTGGATGTC CTTGGGCATG ATAGTGACGC GCTTGGCGTG 
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GATGGCGCAC 
CAGAGCCATC 
GCGCACCAGG 
GCGGATCTCG 
GGTGGCCGGA 
ACCAGTAGAC 
AACACCCAAC 
AGTGATTGGA 
AAGCCAGCAA 
ATTATTTAAG 
AGCTGGTCTT 
TTACAGGGGA 
CATCTGGTTT 
AAAGGTATTA 
AAAAAAAAAA 
TCTGAAACCT 
ATTGAAATCT 
CTATGAGACG 
CTTTTGTTTT 
TGCCGCCAGG 
CCAAGAGAAA 
CAATCATTTT 
GAGACAGGCC 
AAAGCAGATC 
AACCCAATAG 
TAATTAGGAG 
AAGAAAGGTT 
CTGTGTTTCA 
CAT^TTTTTA 
AGGAAGCACT 
TGGCACGTTG 
TTTGAAAACT 
TGTGCTAGGG 
ATTCTCCCAT 
TTGAAATGCA 
CAGGGTGTGC 
AGCAGAATGC 
TTGTGAGGCC 
GATTGCGTGT 
GTATGGATGA 
AAGTTGCAAG 
GAATTCTGGT 
TCGCTCTAAA 
CTAAAACTCG 
TGCAAATTAA 
CGGATGCTTG 
GCTGCTTTTT 
TTTCCTTCAC 
AGCAGGACCA 
CAGCGCGCAG 
AGTCCCACCC 
TTATACCACT 
CCCGCCGCTT 
AAGGCTGCAG 



AGGTTAGTGT 
ACAGCGGAGC 
CGCTGGAAAG 
CGCAGAGCCA 
GCGCTTTTGC 
TTCCGAGCAG 
ACTAGCGCAA 
TGATAGAAGA 
CAATCGTGCA 
TGGTATTTTA 
AAACTTGGGC 
GCCCCACTGC 
TCATAACCTG 
TAATTCCCCA 
GAGGGAATAC 
TTCACAAGAA 
ACAAAGCATC 
TCTTTCTCTT 
CTAAAGCCTA 
TACCACCAGC 
CTGGATAGTG 
GAAAATCTCA 
ACATTCTATC 
TATCATCCTT 
AAAAAAGGGA 
TATTTCCTTT 
TATATCTTTC 
TAACTGACTA 
GAAATTACGT 
GGTGCAGAAG 
TTTGAACAAA 
ACCACAGCAG 
GGTTATCCAG 
TAGCTGAGTC 
CTTAACAGCC 
TTGCATTTAT 
CTGTCAGGGA 
CATAAATATT 
GCTGACATGG 
AAAGGGCATT 
TGCAGAAACG 
GTTGTCTACG 
ACATTGCCAG 
CACTTTTCTC 
AAACTAACAT 
TGGCACTGCA 
GAGAGAGAAG 
AATGAGGCAA 
TAGGCCCTAG 
GGGGCGCTAG 
TATAAATAGG 
TTATTTGGTG 
CTGCTGCTCC 
CAGCCTCCAA 



CCTCAAATAG 
TCTGGAAACG 
GTAGTTTACG 
CGGTGCCCGG 
GGGCTGCCTT 
TTTGCTTAGT 
ATACGCCCAT 
CGCTAAATAT 
GTTTCACCGG 
TTACTACTAT 
TCAAAGGATC 
GCCGGCTTGG 
AAGGCTGTGT 
ATTCCGTATA 
TGCTCACCTC 
TTGGATTCCT 
TCAAACATAG 
GATTATGCTC 
GGTGTACTCT 
TGGGAGTTGT 
GTTCGCAAGG 
AAACACTGAA 
TTTTGATTGG 
CATTTGCATG 
GGCAGAACCC 
TCAAAAGTTG 
ACAAAGGGTT 
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GAAATATTTG 
ATGGGTACAA 
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TCTGATTGGC 
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TTTCCACACT 
ATTCATTCTT 
AAAATGTAAT 
TCCCTCCGCA 
CTCTGGCAAC 
TTTGTAAACC 
CGGTACCCTC 
CCAGAGCGGC 
AGGCCCCCAG 
GGCGCGAGGG 
CTGCGTTGGG 
TGCTGTGTTA 
TGAGAAACCT 
GAAAAAACCC 



CCCTACCAAG 
CAGGTCTGTT 
AATAAGCAGT 
CCGGTAGCGG 
AGTGGCCAAC 
GCGAGCCATG 
GAGCTGCTCT 
GACGTTACAC 
CTACTATATT 
TATTTTATTT 
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ACTTTAATTT 
TTATTTTCCA 
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CTCTCCGGAA 
TTGTAATGCT 
TAGGATTACA 
TTTGAATCCT 
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CGTTTTTTCA 
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CAAGATGTAG 
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TACTTATCTG 
GCTCAGGTTA 
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TAGAAGTGGA 
AATGTGAGAT 
TAAAGGGTTG 
TGCTTTCACA 
TGGACCCAGC 
AAGAATGAGT 
GTAAACCTTA 
GAAGTGCATC 
TGCAGTTTGA 
GTTTGACGTG 
AGAGTTGATG 
ACTATTCAAA 
GGACCTCTAA 
GCCCCCTCTC 
TGATGTTACT 
TTTTTCTGTG 
CTGCCTTCTG 
GCGGGCACTG 
GCCTTTTTTT 
GTCACCATGT 
TTAGCTGGCA 
GCTGGCCCTT 



TAGGCCTCGC 
TTAAAGTCCT 
TCAGTGGACT 
TGGGGCTTTT 
TGTTTGCGTG 
ACGGAAAAAC 
ATTTATAGTG 
ACTCTGATTG 
CTATTCCAAC 
TACTTTTGCT 
AGCATCCAGA 
TTTAAACTTG 
TAAAACAAGG 
CTTTAGGAAA 
ATGTACCCTT 
TTAATTGACT 
CTATTACTCA 
AAACTTGCAG 
AAATGGCGTT 
GAGCAGGAGG 
AGCATTGCCA 
TGACCTTTTT 
TTTCTTGAAC 
ATTTTATTTG 
GTGGAAACTC 
GATACCTCGC 
AATCTTCCAA 
AGTTTCCAAC 
TTCTCAATAA 
GGACCACTCC 
CTTAGCATGG 
TACCTCACTC 
TGTCGATTTA 
CGTTTTAGCT 
TTACCATAAA 
CTAAGTGGAC 
ATTTAACGCC 
TGACCATGTC 
ATACCAATTT 
CCACATTGGA 
GTATTAATTG 
AAAGGTATTC 
ACAACTGGCC 
ACACTGTATT 
AAATTTCTAA 
AACCTACTCC 
GGGCGGCAGT 
TGTTTGCTTG 
AGACTGGGCG 
ACGGGCACCA 
CGCATCCTGC 
CTGAAACAGT 
AGAAGGCAAA 
CCGTGTCAGA 



ACGCCTCCTG 
GCGCAATCTC 
TCTGATAACG 
TCACGCCGCC 
GCGCCTTGCC 
AGCACAGCGG 
TGTAAAGTGC 
GTCTATCTTT 
TCTACAGATG 
TTGTTCCCCA 
GTAGCTGGGA 
TCCTCTTCTA 
CATTGATTCC 
AAAAAAAAAA 
TACGGGAATT 
TAGGAGTGTT 
GAAACATTTT 
CGTTCTGCAG 
TCTCCAGCAC 
TGGACTTGGC 
AGAGCTAATG 
AAATTCACAA 
AGCCATTTAG 
AAACCAGTTT 
CTGAATCAGA 
TTATTACACT 
TTTTGTATAC 
CGTTATTTTC 
AATGGGACGT 
ATTATTTGGT 
TTCGGACTTA 
TCTGCTGTGC 
ATAGTTTTTT 
TATTGATACT 
ATCTTATCCC 
TTAACTCCCC 
TTTCGCAGGC 
ATGGTGCGCT 
GGGGCATGTT 
CTGTGGAAAT 
CAGCGTTTGT 
GCGAGACACA 
CTAACACGGC 
TTACATTTCT 
TAAAACTCCT 
CTAAAAAAGA 
CTGCCTACAA 
CGTTGAGGGG 
AAACCCTCGG 
ATCACGGCGC 
TTCGTCAGGT 
GCCTCCCGCC 
GAAACCTGCT 
GCTGATCGTG 
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116641 CAGGCTGCTT CCTCCTCTAA GGAGCGTGGT GGTGTGTCGT TGGCAGCTCT TAAAAAGGCG 

116701 CTGGCGGCCG CAGGCTACGA CGTGGAGAAG AACAACAGCC GCATTAAGCT GGGCATTAAG 

116761 AGCCTGGTAA GCAAGGGAAC GTTGGTGCAG ACAAAGGGTA CCGGAGCCTC GGGTTCCTTC 

116821 AAGCTCAACA AGAAGGCGTC CTCCGTGGAA ACCAAGCCCG GCGCCTCAAA GGTGGCTACA 

116881 AAAACTAAGG CAACGGGTGC ATCTAAAAAG CTCAAAAAGG CCACGGGGGC TAGCAAAAAG 

116941 AGCGTCAAGA CTCCGAAAAA GGCTAAAAAG CCTGCGGCAA CAAGGAAATC CTCCAAGAAT 

117001 CCAAAAAAAC CCAAAACTGT AAAGCCCAAG AAAGTAGCTA AAAGCCCTGC TA;VAGCTAAG 

117061 GCTGTAAAAC CCAAGGCGGC CAAGGCTAGG GTGACGAAGC CAAAGACTGC CAAACCCAAG 

117121 AAAGCGGCAC CCAAGAAAAA GTAAATTCAG TTAGAAGTTT CTTCTAGTAA CCCAACGGCT 

117181 CTTTTAAGAG CCACCTACGC ATTTCAGGAA AAGAGCTGTA GTACACAGAT GAAATCCCCC 

117241 AAGCAAATGC AACACGCCCT CAATTATATT AGAATCACTT GGAGAGTCGA TAGAACTTTA 

117301 ACATAGCCTC ATCTAGTAAG AATTTACTAC TCAATCTATC AAAGATAGCA AGGTGAATTC 

117361 AAATGCACCG AGTTAAAATC GAGTTTTAAA GTCACCTGGG TTTCGGTAGC CGGAAGTCCC 

117421 GCGTCTCACG ACTCCAAGCT AATTAGTCAT AACCGTATTG AACCAAGGTT GAAGCCCTIGT 

117481 CCCAGGCTTG AGGCTTTTTA TTATACAAGG TTAAAGTGGG GATATTGCGT TTTGGGGTCA 

117541 ATATTGCTAA AGTAGCATTT TCCGAAATTG GGTGGTCCTA AGAAATGCTT CTGGGATAGT 

117601 TGGCAAAATA TATGGCTTAA CCACGCCCTC TCCACAGGAG TGGCTAGCGA GCTGTCTGTC 

117661 CTTGGGAAGG ACGGTGACCC TGCTGGCGTG GCTGGCGCCC ACGTTGGCGT CCTCTGAAAG 

117721 CCCCGCCAGG TAGGCCTAGC TCGCTTGCTT TCTGCAGCGC CATCATGACA AAGCTTTGAA 

1177 81 ACGCAAAATG CTTTCTTTGT GCAGCGCCTT ACCATGGGTG CACTTACGGG CTGTCGACTT 

117841 GGTTTAGGCC CTTGTCAGGA CAAAGGAGCT TAGTTTGTTG GAGTTTTAGA GCTGCAACCC 

117901 AAAATCCCTT GCTCGGTTTC TCTGTTTTTA GAAACGGAAG CGCCCTGATT GGATATTTGA 

117961 AAATTACTGT GCTTAACTGG ATCGTGTTTC ATCAGTCGTG CAGGATTTTC AACCCTGGTG 

118021 GAGCCCACAC ATTCAAAACT GAAGATCCTT TTCTCAGAAC TGCCCCTTTA AGCTTTTGCA 

118081 ATTTTAATTC TGGGGGTCAG ATTTTAATAA TTGGACTTTT TTGTTTACAT CTGACAAGAG 

118141 TATATGATGA GCCAAGTTTA CTCACTTTTA CTTAGTGCAG TTCAATTCTA AAAGTTTATT 

118201 TTTGCGTGTG TGCATATGAG TTAATAATCA GTTGTATTTT TCAAACGGTC TTTTTTCAAT 

118261 TGTTTTGCTT AGCTCCTTCC ATCGTCTAAA GTCAGGGATA CAGGCACATC ACATCCCTGT 

118321 TCCCCCTTCC TCAAACTAAT ATGTAGCTAC CTAGGTTTAT CCTTTAAAAC AAAAATTCTC 

118381 ACCTATTTTT GTGAGAAATA TACATGTTTT TCTTTGAACT AAGTATTTTA CATACACCTA 

118441 TCTATATACA TGCATACTTG TGGTTTTGTT TTTTTAAAAA AAAAAAAAAA AAAACACGTT 

118501 ATCTTTTGAG ACTGGGTCTC AGTCTGTTGC CCAGACTGGA CTGCAGTGGC ATAATCACAG 

118561 CACACTGTAA CCTCCAACTC CTGGGCTCAG GCTATCCTGC AGCCTCAGCA TCCGGAGTAG 

118621 CTGGGATTGC ATGCACGCAC CACCAAGCCG GGCTTTTTGT TTTTATTTTT TGTGGAGACA 

118681 GTCACACCAT GTTGTCCAAG CTGGTCTAGA AATGGCCTCA AGTGATCATC GACCTCCCAA 

118 741 AGTGTTGGGA TTACGGTCAC TGTGCCTGGC CTTGTATGCA TAATTGTTTT GTCTTTTGAT 

118801 TAGGGTTATT AATTTAAAAA ACAAAGCCTG GACGCAGTGG CTCACATCTG TAATCCCAGC 

118 861 ACTTTAGGAA GCCAGATGGG CAGATTACTT GAGCTCAGGA GTTCAAGACC AGCCTGGGCA 

118921 ACATGGTGAA ATCCCATCTT GACAAAAAAT ACAAAAAATT AGCAAGGCCC AGTGGCACGC 

118981 ACTTATAGTC CCAGCTACTT GGGAGGCTGG GGTGGGAAGA TGACTGGAAC CTGGGAGGTA 

119041 GAGGCTGCAG TGAGCAGAGA TCGTGGCACT GCACTCAAGC CTAGGTGACA GAATGAGACC 

119101 CAGTCTCAAA ACAAAAATAA TAAAAATTTT TTACAACGAT GTTATATACA CTTCTGCATG 

119161 TTGCTTTTCT CTTAACCAAA CTTTTCTAAA ACCCTGTCAT GAAAAAAGAA ATCCTTCACA 

119221 TGGAATAGCA TAAGTTATTC ATCCATTTCT TATTGATAAG CATTGATGTT TCCAGTTACC 

119281 ACTGCTGAAC ATGGTGCAAT TGAATAGAAT TCCAGGGCTG AGATTGCTAG GTTTTAGGTT 

119341 GTATTTTATT ATTTTATTTA TTTATTTATT TATTTAGACA GAGTCTTACT CTGTCACCCA 

119401 TGGTGGAGTA CAGTGCCATG ACCTCAGTTG CAACCTTTGC CTCCTGAGTT CAAGCGATTC 

119461 TCATGCCTCT GGTCTCCCGA GTAGCTGGGA TTACAGGCAC CTGCCACCAG GCCTGGCTAA 

119521 TTTTTGTATT TTTAGGAGAG ATGGGGTTTC ACCATGTTGG CCAGACTGGT CTCAAACTCC 

119581 TGGCCTCAAG TGATCTGGCC ACCTCGGCCT CCCGAAGTGC TGGGATTACA GGTGTGAGCC 

119641 ATGGCGCCAG ACCTGGACTT TGTCTTCTGT TTCATCAGTC CTTCTGTTGG TTCAAGCACA 

119701 GTATCACACT GAAGACTGAT GATTCTATAT AAATATGGTA AAGACTGTAC ACCCTAACTG 

119761 TTCTTATTTT TTAATTTTAA GGCAATTTTA GATTCCAGCT TTCCAAAGAA TTGTGGAATG 

119821 CTTAGAGCTA GAGAAGCCTT GGAAGTCATT TAGTTTTTGT TTTGTCAGAG AAAATTCTGT 
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119881 AGAGACTCTG TCCTGCTCTC ACTGAATACC ATCCCATAGT ACCCCCCAAC AGCTTTAAAG 

119941 GGCAATAATA CCTTATGGAC AGTATGCTTT TCCTCAAATA TATTCTAAGC CATGGTCAAT 

120001 GCAAAAGAGT GAGAAGGAAA GTAGAATAAG TTATCTAAGA ATCAGTGGGT GCTCTCTTTA 

120061 AACTGATTTA TCACTCCCCC TTCCAAACTC TCTTGAAGGT CACTCTGCCT CCCTTTCTAC 

120121 ATAAGAACTC CTAACTCCAA GGGAGGAAGG TAAGTTATTC TTATTCCTTG CTTAGAAAAA 

12 0181 GAGAAAATAG GTTTGGTAAG CATCCGCTTT CTGCTACCAT TCTCTGTGTT TCTGTGTTTT 

120241 TTATAGGATC ATTCAATTAT TGGTTGGCTC TTGAGAGGGA ATGCAAGGTT CAAGGACACA 

120301 AGCCTAGATC TTGCCTGTAT AGAACCTCAT GATGTTATGC TTCTCTAAAA TGAGGCCTGG 

120361 AGGAGACATG TTGAAAGTGA CCCATAAATC TGCAGTATCT CATGTCTCTC AATGGGGACA 

12 0421 AGGAGTACCA TGGGAAATAG CATTAGGTCA ATGACAGTAA CAACTCCCAG GTGAGTTGAT 

12 0481 TTATTCTTTT ATTTATAAAG TTGTTAATAT GCTACATAGT CCCTAATTTT GCCACAAATA 

120541 GTCATTATTT TAATTTCATA TTTCACTATT GATAAATGAA GGAAAAAATG AGTAGCAGTT 

120601 AAGCAGTCCA TAAACCTACA TATAAAGCAA ATTGGAGATT TTAAAATTGA TTCTGGATGC 

120661 TTAAAATCCT TCTCATTGAA AAAAAATTTC GTATTAGAAG ATTTCAACAT TCTTTAAACT 

120721 GAGAAGCATA ACATATAAAC AGAAAACCAC AGCAAAACAA AAATGCAAAG CTCAATAAAT 

120761 GAACACAAAG TGAACACCAT AATAATTGCC ACACAAGTAA AAAAACAGAA AATCAGCCAA 

120841 CCCTCCCAGA GCCGCCTGAT GCTTGCTTCC AGTCACATTA TCACTCCATC TGCCCTAAAC 

120901 ATAACCCCTA TTTTGATTTC CAATGCTGTA ATTTAGTATG CCTGTTTTTG AAACATATAA 

120961 AATGGAAATA AAACAAATGT AATCCTATGT ACCTGACATA TTTCACTCCA GAACATTAGG 

121021 TTTGAATAGA TTCATCTGTG TTGCTGTGTA TAACTTTAAT TCATTTTTAT TGTTATGTAA 

121081 TATTCCATGT TATGAGTGCA ACAATTTAGG TGTCTACTGT TGATGCATAT TTGCTTCCCT 

121141 TTTTCAGCTA ATATAAACAA TACCGTGAAT ATTCCTGTGT ATGTGTCTTG GTATATATAG 

121201 GAATACATAT TTTGTTTGTA TACCTAGGAG AGGAATTGTT GGGTCAAATG CTAAACTCTT 

121261 TTTGAAAGTG GTGATATTAG GTTTACATGC GATGAAATGA AAATTAAAAC CACAGTTATA 

121321 AACAGCATGG ATGAACCTCA CAAACCTAAT GTTGATGGAA TCTAGCTGGG AATTCCTGTT 

121381 CTTCCATATA CTTCCCAATA TTTTTTTCCA ATTAAAATTG TTAATCTTTT GAAGATGTTA 

121441 TCCATTGTGG CAGATGTGCA GTATTATCTC ATTATGGTTT TATTTTACAT CTTTTGCCCA 

121501 TTTTTTCTTA ATTGGATTGT ATATCAGTCG ACTTGGGCTG CCATAACAAA AATACTAGAC 

121561 TAGGTAGCTT GAACAAAAGG AGTTTATTAC CTCACAGTTC TAAAGGCCAG GCCAGAAATC 

121621 CTAAATTGAG GTGCCAAGAG ATTCAGTTTC TAGTGAGGGC TCTCTTATTG ACCTGAAGAT 

121681 AGTTGCTGTC TTAGATTGTT TGGTGCTGAA CAGAATACCA GAGACCAAAT AATTTATAAA 

121741 GAATACAGAT TTATTTCTTA CAATTCTGGT GGCTATAAAG CCTATGGTCG AGGGGCCCAC 

121801 CTCTGGCAAG GGCCTTCTTA CTGTTATGGC AGATGTGAGA TGTCATCTCA TATTCAAACC 

121861 ACAGCAGTCG CCTTTTGTGT CCTCATGTGG CCTCTTCATA TGCCCATAAA ATGACCTCAT 

121921 GTCTCTTCCT TTTCTTATAA GGACACCAGA TCTATCAGAC TACTGGCCTA CTCTTATGAC 

121981 CTCATTTAAC CTTAAATATC TCCATAAAGT CCCAAAATCC CTATCTCCAA ATATAGGCAC 

122 041 ATTGGGTGTT AGAGTTTCAA CATCAATTTT GGGGGAACAC AATTTAGGCC AAAAAGATTG 

122101 TGTTTTTTCT TGTTGGTTTA AGATAGCTGT CTTTTTGTCC TTTTTGTCCT TTCTTTT TT T 

122161 TTGAGGTGGA CTCTTGCTGT GTCACCCGGG TTGGAGTGCA GTGGCGCTGT CTCAGCTCAC 

122221 TGCAACCTCC ACCTCCTGGG TTCAAGAAAT TCTCCTCCTC CCAAGTAGCT GGGACTACAG 

122281 GTGCATACCA CCGCGCCCTG CTAATTTTTG TATTTTTGAT AGAGACGGGG TTTCACCATG 

122341 TTGGCCAGGC TGGTCTCAAA CTCCTGACCT CAGGTGATCC ACCTGCCTCG GCCTCCCAAA 

122401 ATGCTGAGAT TACAGGTGTG AGCCACCAAA CCTGGCCTGT CTTTTCTGTT TTAAGTTTTT 

1224 61 AAATTTTGCT CACGAACCCT TTATCCATTT TATGTGTTGC AGGTATTTCC TCTGTAACTT 

122521 GTCTTCACTC TGTCAGAGGC TGGAGTGCAG TGGCACAATC ACAGCTCACT GCAGCCTCCA 

122581 CCTCCCAGGA TCAAGCGATC CTCCCATCTT ATCCTCCTTA GTAGGTGGGA CTACATGTGC 

122641 AGGCCACCAT GCCCAGCTAA TCTTTGTATT TTTTTGTAGA GATGGTGCTG TTGCCCAAGT 

122701 TGGTCTCAAA CTCCTGAGCT CAAGCAATCC ATCAACCTTG GCCTCCCAAA GTGTTGGGAC 

122761 TAGAGGTGTG AGCCACCACT GCACCCAGCC AATGATATCT CATGATGCAT TAAAGTCATT 

122821 AATTTAGTGT ACTCAAATTA AGCACACTGC CCTTTTATGC ACAACCTTTT TTGTATCTTA 

122881 TTTAAAAAAT CATTTTCTAT TTCAAGGTCA TGAAGATCTT ATTTTATAAT ACCTTCTTGT 

122941 GAAATTAGTT CTCAAGACTA CCCTCACTTC TAACACCAAT TATAAGTTGG GAGGTCTGTG 

123001 GTTCCCAATC AACCTTAGGT TAGTAATTTG CTAAAAGGAC TCACAGAACT TGCTGAAGCT 

123061 GTTAGCCTCA TGGTTACAAT TTATTATAGG ATATATAGCT TATTATGTCA TTCCAATGCA 
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123121 ATGTAAAATT ATACAACTAC TTTTAAAAAG ATTTTAGCAT TTGACCCAAC AATTTCACTC 

123181 TGAGGTATAC AAACAGCAGA TATGTGTGCA CATATATACC AAGACACATA CACAGCAAAA 

123241 TTCATTGTTT GTAATAGTTG AAAAGGGGAA ACAACTCAAG GAATAAAGAT TAAAATCAGC 

123301 TGAGAAAAGA AACACACAAG GCAGTATTAT GGATCGAATT GTATGCAGAT CTCCCTTGCC 

123361 CCCAGAAGAT ATGTTTAAAG TCCCAACTCC CAGTACCTCA GAATTGTGGC CTTATTTGGA 

123421 AATAGGATAG TTGCAGATAT AATTAGTTAA GATGAGGTTA TAGTACAGTA TGATGGGCTG 

123481 GTGACTTAGA AGAAGTAGTA TATATATATT TTTTAATAGA ACTAGTATTC TTCTAAGGTG 

123541 GTCACGTGAA GACAGACACA CACAGGCAGA GACTGAGGTT ATGCAGCTGC AGGTCAAGGA 

123601 ATGTCAAAGG TTGCCAGCAA GTACGAGAAG CTAGGAAGAG TCAAGGAAGG ATTTTCCTAC 

123661 AGGCTTCAGT GGAAGCATAG ATCTAATGAT ACCTTCATGT CAGATTTCTA GCTTCCAGAA 

12 3721 CTACAAGAGA ATATATTTGT TGTTTTAAGC CACCCTAGCT TCTAGCTCTT TGTTACAGCA 

12 3781 GCCCTAGGAA ACTAATATAG GCACAATCCA GGCAAGTTCC AAATATGAGC TTCCAGTTGT 

123841 CCTCTCCCAG TAATATGAAC AGTATTACTT TCCCAGCATT AATGTGTGAC AATACACATG 

123901 ACGTACAGAG CAGTCCCCAC TTATGCACAA AACATATGTT CCAGGACCTC CAGTGGATGT 

123961 CTGAAACCAT GGATAGTACT GAACTCTATA TAGCTGTTTT TTCCTATACA GACACAQCTA 

124021 TGATAAGGCT TAATTTATAA ATTAGGCACA GTAAGAGATT AATAACAATA AATTAGAATA 

124081 ATTGTTAAGA ATATACTGTA TAAAAGTTAG GTGAATGTTT ATTTCTGAAA TTTACCGTTT 

124141 ATTATTTTTG GACTGCAGTA GACCACAGGA ACTAAAACCA TGTAGAAACC GTATACAAGA 

124201 GAACTGTATT TCACCCGAGC CTCAGTGTGC AGTTTTAATG GCCTGCCATG GTTGACTGCT 

124261 CACATGGCCG ATCTTTTAGT CTACCTCCAC AGGTAQAGCT GATACTGTGT GGCTCAAAGT 

124321 TCCTATTATA AATCACATTG TTGACTGTGT GGTGGTCAAA ACCTCCAGGT AAACAAAGAC 

124381 ACACTTATCA GTGAGAACAT TTCAAGGGTC TAAAATTCAT CTCCCAGTAG CTGAGGGCAA 

124441 AGGCTAGACC TCTTTTTGGG TAAGATAAAT TTTTTACCAT ATACTTTATT TTGCTTTTCA 

124501 TGTTTAACTT TATTTTGCTT TTCATGTTAG TTCCCCTGGA ATTGTTTTTT GTGTATAGTG 

124 561 TGAAGTAGGG GGTCT^GTTT C'l'l-l'l'lTI'Tl' CCTTTTTGTT CTTTTTCTGT TTAAAAGGCT 

124621 ATACAATTGT CCCATGCCAT TTATTTACAA GAGTCCTTTC ACCATTGTTG TATGGTGCCA 

124681 CTTTAGATGT AAATCAATGT CCATATTTGT TTGAGCCTGT TCCATTCGTT TGTCTATTTT 

124741 TGGACAACAC TGCCCTGATT ATTGTCATTT TATCAGTTTT GATATTTAAT AAAGCAACAG 

124801 ATTTGTTTAT TTTGGGCCCT TGGATTTGTG TATTAAATTT GAACCCTGTT TGTCAATTTC 

124861 TATAATAAAG CTTATTGGGA ATCTGATTAG GATTACAATG GTTTTGTAGA TCAGTTTGGG 

124921 GACAATTAAT ACCTTTAAAA TATTGACCGC TTCAACTGTA AATATACTCC TCCATTATTT 

124981 AGTTTTCCTG TTTAATTTAT CTGAGTAATA CATTATAGTT TTCTTCGTAG AAGTCAGATA 

125041 CGTAGAAAAT TCAAAGCCCA AGTGCAATAG CTCATGTCTG TAATACCAGC ACTTTGGGAG 

125101 GCCGATGTGG GTGGATCACC TGAGGTCAGG AGTTTGAGAC CAGACTGGCC AACATGGTGA 

125161 AACCTCATCT CTAGTAAAAA TACAAAAATT AGCTGGGTGT GGTGGCGGGC ACCTGTAATC 

125221 CCAGCTAATC AGGAGACTGA GGCAGGAGAA TCGCTTGAAC CCAGGAGQCA GAGGTTGCAG 

1252 81 TGAGCCAAGT TCCTGTCACT GCACCCCACC CTGGGCGACA GAGCGAGACT TCGTCTCAAA 

125341 AAAACAAAAA AAAGAACATT CAAATAATCA ATGTAGATAA TTCAAATAAC TAAAAAATGA 

125401 ACAGTTATTA AAATATCAGG ATATAAAAGC AAAAAAATCA ATAACCTCCA TATATACAAA 

125461 ATGGCCAGTT AGAGAAAAAA AAAAGAATAG GCGAGACTTA AAAAGGCTGG GAATCTCCCT 

125521 GAAAATCTTT GAGAGCCTTG GCCCTGCCCT CAGGGATTTC TCTGGCTTCA TGCCCAGATA 

125581 CGGGTACAGT TCCTTGTTTA AAAAAATTTT GCTCCATCAA TCAACAAGGG GCTCCTTCCT 

125641 CAGAGCACAA GGACCTCCAT AACACCGGAC ACTAGATGTC TAAGGGACAC CTCTTAAGGA 

125701 AGTTAGACTT CCAAAGAATG GTGTTTCCTC TGTCCCCAAA CTCTGGAACT CAGAGCACAA 

125761 CTGCTCCTTG GAGTTCGGTT TCAAATCTAC AAGGCTGTCA TGGAGGTTGC AGACCAAGTC 

125821 CGTGGCCTCA GTGTCCGGAT GTACGGTGGC CTTGGCACCT GAATGTGAGA ACATGACCTC 

125881 CCTGAAACCA CCACAAGTAT TGTTTCATGT TATGTATGTT TTTTCTTATC TGAAATTCCT 

125941 TTTCTTTAAA AATTCAAATT ACATATTTTG CAAGCCCCTG AACAAGCTTC ATGAGCATTT 

126001 ATTGAACCCA CAGCTTTTAA AACCTACTGA ACACTTTGCT CTATGTTGTC ATTCACTATC 

126061 CACCAATTAT TTAATTATTG ATCAATATTG TTTCCTTAGT GTTGGGATCA TTTATGCATG 

126121 TATTTCTTTT ATATTGCATA TTTTATATTT CTGCATTACA GTTATTACAT ATTACTTTTG 

126181 CTACAGTAAT AGTTCAAAAG TGTACATCCA AAATTTAGCT GTGAAGTGGA TGGACTGAGG 

126241 CAGAACTGGA GGCAAGAAAA TGTCACAGTA ATTCTAAAAA AGATGATGTA CAATTAGAGC 

1263 01 AAGAGAGTAG CACTGAAATT GAAGAAAAAT AGATGCGTTT GAGAGAAAAT TAGGAGGTAG 
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126361 AATCAACAGA TTAGATGTAG GGATGAGAAG GGTCAAAGAT GACACTAGGG TTTTTAACTG 

126421 GAGCAAGTAG GTAGACAGAA CATTTCTTCC TGAAAGGGCA GGTCAGATCA TGTGTTGTCT 

126481 CAAAGGGCAT GAAGAGTAGA AAGCCTGGGA CAGATCCTGA GATGACCAAT ACCCATGGTG 

126541 CAGGGAGAGG GAGGGAGATC TGCTAAAAAG ACTGCAAATG TCAGGATAGT AGAAAATCAT 

126601 GAGTGTGTGA TGTCCTGGAA GTTGAGACAG TATCACATTT GAGAACATTT AAATTGGTAA 

126661 CTCTGACAAA AAGCTGGAGG CCAACTGTGA ATGCCCATGA GAGTGAGAAG CTCCCACACT 

126721 TTTGTGGGCA TCAGAAAGCC CACCAGGTTC CTGCAGTGAA GATCTGAGAA GGATCCTCTT 

126781 GTGGCTTTGG CAGGGAGAGA AGAATTATTA TGAAATACAC CCCAGAACCT TCTTCAAAAC 

126841 AAAGGCCTAC TCTCAAGGGG AAAACATTTT GCCAGAGTCT TATCCCAGCT GGGAGAAGGT 

126901 AATTCTTCCC ACTGCAGCCT CATCTAGGCT TTCTGTCTCA CTTAAGGGAA GAAAATTAGT 

126961 CAACAGGGAT CAGAGCTTCA TGAAAATAAA TTGGAAATGG TGCAGCCAGG AAAGGAGCAA 

127021 AGGTCTGAGG AGGAGGAGAA GGAGGAAGAG GAGTTGTATC ATTATAAATA CTTGAGGAAG 

127081 AGGAGGAGAA GGAGGAGGAG GAGGAGTTGT ATCATTATAA ACACTTGAGG AAGAGGAGGA 

12 7141 GGAGAAGGAG GAGGAGGAGT TGTATCATTA TAAACACTTG AGGAAGAGGA GGAGGAGAAG 

12 7201 GAGGAGGAGG AGGAGTTGTA TCATTATAAA CACTTGTGAC GGTCCCAGCC CCAAGATATA 

127261 GGCATGCTAA TAAACTGAGG CTTAACACTT TGACTACAGA ATGCTGCTTC TCCCTAACAC 

127321 CATCAAGGCT CCAACTGAAT AACAATGAAT TATGAATGAA AGAGCTGTAA GGAGAGACAA 

127381 AAGTTAGAAT GAGACAAGTA TTGTTATCTA GAGATGCCAA GAAGGCAAGG AAGATAACTA 

12 7441 AAAAGGCACT CTGGATTTAG AAATAGGAAG TCATTAGTGA CCTTGTAAAT AATGGAGCCA 

127501 GAGGAATACC AAGGGCAGAA GCCTCACTAT AGTGTGTTGC ACCTGTCAGA GGTCAGGAGG 

12 7561 TGTAACTGAC TCTCCCACAG TGTGGCTTTG GAAGAGAGAA GTCAGCAGCT GCATGGAGAT 

127621 TTGGGAGAGG GAAAGCTTTT TTTTTTTTTT TTTAATTGGA AAAGACTGAG CTATGTGTAA 

127681 ATAGAATAAG ACAGGAAGAG TGTAGACACA GGAAAGAGGG CAGACAAAAA CAAGTGCACA 

12 7741 GTTATCTAAG GGAAACAATG GGATCAAGCT GCAAGTATAT AAACTTGTCT TGATAGAAGA 

1278 01 ATCCTTGATC TGGTTTATTC AGTGTTTGGT CCAAACCCAC ATCCCTGTTC TGCCTGTCTC 

127861 TGACTTGCTC TGTGCCCCAG AAGCCCAGCT TCTACAGATA GCATTAGCTG GGCAGCCCTG 

127921 CCCTCTTGCA ACAGCTGGAT TTGGCCAGTG ATCAGCCCAG CAGGAATGTA GATGGCAAAG 

127981 GAGAGAGAGG TTAGTGTACT TATTCCCTGC ATCACCCCCC TGCTTGGTGG GCAGCTCTTC 

128041 CTCCACAGTC CCAGCTCTGG CCTAGCTCTG GTTACAGGTT CCCTCCCATT GCCTCTTCAG 

128101 ATTTAAAGGT GTGTCTGTCA GGGTATAACT GGGAGCTAGA AATTGCACTG AAATTGAACA 

128161 AAGAATTTTA TGGGAATGGT TGTTAACTAG TTATAAGAGG ACTGAAAATG GAAAAGTGGA 

128221 CAAACGTATC AGAGATAGTA ATGACAGAAA GCAACTACCA CCTCCAGGTT TAGGAGAACA 

128281 AGGAAAAGAT TCTTTGAAGA GATCCCCAGA ACTGGGACCT CTGAGGAGTG TATGCTGGAC 

128341 CACTGATGAT GATATGTCTG TAGATAGAGG CATGATGAGG CTGATTTTAG GAGCATGGAA 

128401 GATCTCCAAA CTGAAGCCAA CTGCTGTTAC TGGATTCAAC TGCCACTGCC AGGTTGAAGA 

12 8461 ACCCATTCTG TGAGGATGTC AACAAACAAA GTGGGAAATC TTTTCACATC CTTCCAGCCC 

128521 TCTAGTCTTC CTCCAGTGCT TTCTATTGGT AGGGTTTGGG GAGGTGGCTA GCAAAGCGGT 

128581 ATTGGAAAAG ATAGAAGAGA CTAAATCTTC ATAACCAGCA CAGGGTGACA CTGGATCACT 

128641 ACTGTTGCTG ATCTTGGGCT GCCTCATATC CCCTGTTCTT CCCATTAGCC CTGTCACAAC 

128701 TTTGTAGATA TCCCTTCATT ATATGCCCTT CATATATTCT TTTGGTTTAA CTTTTTCTGT 

128761 TGGAATCCTA ATATGGCACT CCTCCATTTT TCAGGACCAA AAGAGTATAA AAGATTATCT 

128821 TTTACCAAAA AAAAGACAAA AAACTGATCT AATTCCTGAT TTGATCATTA CACAATCTAT 

1288 81 ACATGTATCA AAATATCACA TAGTACCCCA TAAATATATA CAACTGTGTC CATTAAAAAT 

128941 AAAAATTAAA GAAAAGATGG TAAATATAGC TCTGTCAGGC AGTGGAGGTT TTACCACGAT 

129001 GGCTGTTATT TCCCCCATGA AGGGGGGAGT GAGGGAGCAG CTGAAAGTAG GTGCTTATAG 

12 9061 GGGTATAGAG GGGCTCAAAG CTTTGAGAGA GGAGAATGTC TGAAAGAGCT GCCAAATAGC 

129121 ATGCAGGTCC CATGGGGGCA GAGCCTCTGC TCATTCACCA GTGCCTCTTC AATATCTACA 

129181 CTTAAGCCTA ACACAAAGTG TGTGCTTAAT AAGTATTTGC TGAGTATGTA AAGTGGAAAC 

129241 AGAACCAATC TGGCAAACTT TGTAGGACTG GTGGGCAATG AAGATCAGTC AGGTAAAATC 

129301 TGTGGATATA AATTTATATT GATCAAAAAA TTCAAGGTTA GGTGTTTTTC TTCAGTCATG 

12 9361 CTCAACGATG CTTCAGCCAT GCTCAACTCT TCTGTAGCCA CAGAAAAAAG TTTACCCATA 

129421 ATCGAGCTGT GTCTGTGTCT GAATAATGAA AAGACCATGA TGCAAGGGAG TTGGAGACAC 

129481 AGAAACAGTG TTTGAAGTAA TGGGTAATGG AAGCATGCTA CCAGGGAAAG GAAAGAAGTG 

129541 GCAATAGGAA GGAACAGAGA TCTGTG6TCC TATGTCCCCT GAGCATATTC ACATGTTAAA 
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129601 GCTAATTCAG TTTTCAATCA TCATTAAAAT TTTGTTCCTA AATATATGGC CATTATTTTC 

129661 CACAACCACA CTAAAACTTT ATTACCTCTG GCAAGTGACT ATGCAAGTAA CTAAGAGCAA 

129721 AAATATCCAC AACTACCATT TGAGCTATCA ATTTAGGGAA AGTCATCTGG CTATAATCTA 

12 9781 AGTGACCCTC CACTGAATGT CAGTATCTTT GCATATGTGA TTTAAATCTG GGCCTTCGCA 

12 9841 ACACCATGAA CTGTTCTTGT CTTGAATATC CAGATTGAAG GAAATAATCT GAGTAGTTAC 
129901 GAGTCCTGAA GCTAGAAAGA TGGAAACCCC ATTTGCTCAT CAGAAAGCCT TAGAGCTTGG 
129961 GCGCTGGCGG GTCCTGTCTC ACCGGGACAG AGGGGCTCTT TCCTCCCCAT CTGATAGTCT 
130021 GATAACTAGA GAAGCCGGCC AACTTATTCT CCAAGAAGGA GCCATCTTAG TTCCTCCTGA 
130081 AATGTTCATA TTTAGAAATT ATTGTTTGTC AGTAATTTAA CCCCTTAATG GGCTTGCCTT 
130141 GTGGTCCATA CCACTGAGTG CAGAGCTTGC CTGGAAGAAT TGTGAGGGCC ATTCCATCTT 
130201 CCAGGCAGTA GAGTTCAGTA CTTCTTTAAA ATTGCTGCTG AACTCTGTAT TTGAAAAGAA 
130261 AGAATCATTT GGGTGTGGTA GCTCACACCT GTAATCCTAG CGCTTTGGGA GGCTGAGGTG 

13 0321 GGAGGATCAT TTGATGCCAG GAGGACCACT TGAGACCACC CTGGGTAACA TAGCAAGACC 
130381 CTGTCTTTAG AAAAAAAAAA TACAATAAAA TAAATACAAT AAAAATAAAA GCAAAAAGAA 
13 0441 AGAGTCCATC TTAGGGACAG ACTGTAACTA CTCACTGGAG CTTACCTTTA CATAGTTCAG 
130501 GATCAATTAT AATAAAACAC TTTTGTGCAG ATTCAATAGG ATTATTTTAA TCCCCATCAT 
130561 CTCTCTGAGT TTCCAGTCAG TTTCTCTGCA TGTAGACACC CTTCTCCAGC CCACCATTGT 
130621 CTCTCCTCCT ATAGCTCCAC CAACAAATCA GAACTTTTTC TAACTGCACC TAGTGCACCT 
130681 AGAGTCTACT CCAGAATGCT CATGGAGAAA GTTTCTGAAA GGTAAAACTC TGAATGATAT 
13 0741 TTGTAGCTAA AGGGAGACTT GCTAGAGACA ATAAGCTAAT AGTTGTAGAC TTCAGTAGAA 
130801 GAGGAATGAC ACTGCAATGT CAGGGTGCAG GACTTCAAGA GGGCAGAGTA TGGAAACCCA 
130861 ATGGGAAAAA TGCTCACCAG GAACATGAAG AGAAGGAATT' ACGTGTAAGG ATTTCTCAAT 
13 0 921 GTGTTCCCAA ATTTGCCCAG CAGAGGGAGG CCTCGGGTTG ATGGCAGGCT GACCACACAA 
130 981 TTAAAGAAGG CTGAACCTGG GGGCTTTTAA CAACCATCGT GGGCTCTACT GTAAGCATTT 
131041 AGAAAAAGAA AGTTATCCAT TCAAAAATAT ATATATTTTT AAACTTCAGA ACAAAATTAT 
131101 GAAGAGCTAT ATTTACTTTT CTACATTCTA ATTTTTATAA ATCTGAGTAT ATTTTGCATA 
131161 TATTGTTATA GTACATATTC AATTTTGTAT TTTGCTGTTT TCACTTAACC ATTTTTACTA 
131221 GATTACTCTG TGTTCATAAT AATCACTTTT TTAAAACTTT TATTTTTATT TATTTATTTT 
131281 TTTTTTGAGT CAGAGTCACA CTCTGTCGCC CAGGCTGGAG TGCAGTGGCG TGATCTTGGC 
131341 TTACTGCAAC TTCCACCTCC TGGATTCAAG CAGTTCTCCT GCCTTAGCCT CCTGAGCAGC 
131401 TGGGATTACA GGTGTGCACC ACCAAGCCCG GCTAATTTTT GTATTTTTAG TAAAGACGGG 
131461 GTTTCACCAT GTTGGTCAGG CTGGTCTCCA ACTCCTGACC TCATGATCTG CCCACCTTGG 
131521 CCTCCCAAAG TGCTGGGATA ATCACTTTTT ATGCTGCATA ATTCTTCAGA TTTGTCAGTA 
131581 CGACTGTATT TACACTCATT TGTTTTATTA GAAAGAATTC CAGAATATTT TGGCTGCCCT 
131641 AATTAATTTT ACAATTAATA TGATTTTGAA ATTGGGTATT GGCTCCTTCT GAATTGGTTT 
131701 ATTAAAATAT ATTCTAATGT AATTTATGAC ATTTTCATCA TATTAGCATA TTTATTCTGT 
131761 TAGAATTTCA TAATTTATAA AGCTACAAAC TGTATGTGAT ATAGCTTGTA ACTTTATCTC 
131821 ATAACTTTAT GCAGTTACAA GTAGAAATAA AATGTTCCCC TCAAGATTGC 'ITAAAATTTT 
131881 ATTATAAACA AGTGTAAAAA ACAAAATCAC TAAAACACTC CCTCTTTTTT CCCCCAAAAT 
131941 GCATGTTTCC ATTTTAACAG AACCCGTATT TAATCAGCAG ATTTCTATGG TGGCTAGATT 
132001 TGTAGACTAA ATATTAAAAG TCCCAAAGCA AATGCATTTT TCTCTTAAAT TTTACTGACT 
132061 YTTTTTTTTT TTCTTTTTCT GAGACGGAGT CTTGCTCTGT CGCCCAGGCT GGAATGCAGT 
132121 GGCACAATCT CGGCTCACTG CAACCTCCGC CTCCCGGATT CACGCCATTC TCCTGCCTCA 
132181 ACCTCCCGAG TAGCTGGGAC CACAGGCGCC CGCCACCACG CCCAGCTAAT TTTTTGTATT 
132241 TTTAGTAGAG ACAGGGTTTC ACCGTGTTAG CCGGGATGGT CTCGATCTCC TGACCTCATG 
132301 ATCTGCCCAC CTCAGCCTCC CAAAGTGCTA GGATCACAGG CATGAGCCAC CGCGCCCCGC 
132361 CTACTGACTT TTATCCAAAG AAAATATAAG AGCTCTTCAT CATAACGTAT GTTTCTTGCT 
132421 CTTGTTATTA AATATGACAC ATTTAGACTT AAACTGATTT GAAGGTTTAT GACATTGTTT 
132481 AAGTTATTAC ATAATTAATT CATAAAGATA ATGACTAGTT TGAACTACTG ACAGCTCACA 
132541 CATCATCAGT TGAACAGCAG AAAGCTTACT AAGCTACTTT CTTATGTTTC TGTCTCCCAG 
132601 CTACTAAAAG AAACGAAACC CTTCCAGGTG TTAAGGCAAA ACTTTCCTCC CCCTTTCTTC 
132661 TATAAATCTG ATTCCATGTT AGTGAAATTT CTACTGATGG CTTTGGTTTC CTCTATAGTA 
132721 GAATAGAGAT CCTATGGCAA AAGTCATGTC TGACATGGTA GCAAATAGAA ATGGGGAAAA 
132781 GGAAGGTCTG CAAGAGCCAA TGTGGGAAAT GGGGAGAGGA CTGACTACAA AAACCCAGCA 
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CTTTTGCCCA 
TAACAAGAAT 
ACACAGGTGC 
ACAGGGGCCA 
AATACAACAT 
ATGATTATGA 
AATTTTTGTT 
GCAGGATACT 
ACTTATATGC 
CAACAGAGAT 
TAAAATAAGT 
TGGCAATAGT 
TCCAACACAA 
CTGATCACTA 
CCTGTAATCC 
AGACCAGTCT 
GCGTGGTGGC 
GAACCCAGGA 
AACAGAGCAA 
AGAACATCAC 
TTGAAAAATG 
AAAGGAAAAA 
AATCTGGAAA 
AAACAGAGAA 
CAAGGCTATT 
CATCCGGTGT 
AAAATAGACC 
AGAAAATCAT 
TAGTTTATAA 
CAAAAAACAT 
AGCTACAGGT 
AGACTCTTCC 
AGGCTGAGAG 
TAAGGAACCT 
GACATTTCAC 
CATTAGTTAC 
TAGCTAATGT 
CTCATATATT 
TTCTAACATA 
ATAAATGAAC 
AATGCTATAA 
TAATATATCG 
ATGTACGGAT 
ATCTCTACTA 
AGGCTGAGGC 
TGCCATTGCA 
TGCAATATAT 
ATACACAAAA 
ATTCCATTCA 



TTCCCCAAAC 
AAGGGGAGTA 
TGACTCAAAG 
AAGGATGTCC 
TATTGTGAGT 
CCTTCTACTA 
CTAATTTTGA 
ATTGCTATCA 
GCATTTATGT 
CTTGAGCATA 
ATTATGTCAA 
TGTTCATGAC 
ACATTAAGTG 
AGAAGCTAGC 
TACTAGAGGC 
TACAGCTAGA 
TAACAGTAAT 
AGAAATGAGA 
TACACAGTAT 
CAGCACTTTG 
GGCCAACATA 
ATGTGCCTGT 
GGCGGAGGTT 
GGCTCTGTTT 
TATGCACCCC 
AAAAAATGAA 
AGTCCGAGGG 

GTArrrcAGA 

CCCAGAAATA 
TAATAAGTAA 
GGGGGAGAGG 
ATAGACTTAA 
CAACACCCTA 
AACATAAATA 
TATAAAATGA 
TGCAGTGAGC 
TAAAAAATAA 
AAAATATTTA 
TATAACTTAG 
AAAAGAAAAC 
TAGGGAAATG 
TAAAAGGACT 
GTGAATGTAA 
AAATTAAACA 
ACATGTCCAT 
ACTGGAAACA 
GCCAGACGCA 
CACCTGAGAT 
AAAAATTAGC 
AAGAGAATCA 
CTTCAGCCTG 
CTATATCTTG 
TGGATGAATC 
TATGAAATTT 



CACATTGATT 
TTCGAGAAGA 
ATGCAGCTCC 
TCAGGATGGT 
GGTAGGTGGT 
ACCAGAACTC 
GTCTAGGAAT 
CTATGCTTGG 
CTAGATTGGG 
AAACCAACTG 
TAAAAGAAAT 
AGGATGAAAT 
AAATAAGCCA 
TAACTAAGTA 
TGGGAATGGT 
TAAGAGCAAT 
AAATAATTTC 
AATGCTTGAA 
GTATAAAAAT 
GGAGGCCAAG 
GTGAAACTCC 
AATCCCAGCT 
GCAGTGAGCC 
CAAAAATAAA 
ATATATACAT 
ACACAAATAT 
CTTAAACTAT 
ATGAATTGGT 
GATTCACACA 
AAAAATCGTC 
AGCAGGAGCC 
ATGTAAAAGC 
GGATTAGCAA 
ACAAAATGAT 
AAAGCAGGAG 
CAAGATGGTG 
ATAAATAAAT 
TAATACATGT 
TAAGATGACA 
ATACAAATGG 
CAAGTCAAAA 
GACAATCCCC 
GAGGACAATG 
CTTATACAGC 
ACTATGACAT 
ACCCACQTGT 
GTGGTTCATG 
CAGGAGTTTG 
TGGGCATGGT 
CTTGAACCGA 
GGCAACAAGA 
GAATATTATA 
TCAAAAATGT 
TAGGAATGGG 



GCTGGGAGAA 
CGTCTCTGCA 
TTTCATCCTG 
CTCTAATCCA 
TATGGACCAG 
ACACAGCCAT 
ACGACTGTAG 
CCCAGGCCTG 
TTGGTTGGGA 
ATACAATGAT 
GTGATAACTA 
CCTGTCATTT 
GAAACAGAAA 
AATAAGTTTA 
AGGGGAAAGA 
CAGTTCTAGT 
AAAGAGCTAG 
ATAATGGATA 
AACACTATGG 
GTAAGCAGAT 
ATCCCTACTA 
ACTCAGGAGG 
GAAATCGCGC 
TAAATACATA 
ATAATTATTA 
GAATCAATCC 
TCAATCAAAA 
ATAAGGTTAG 
TCTATGGACA 
TTTTCAGTAA 
TTACCTCAAA 
TAAAATTATA 
AGATTTCTTT 
AAATTTCATC 
GCTGAGGCAT 
CCACTGCACT 
AAATAAATAG 
ATCTQACAAA 
AGCCAAAACA 
CCAGTATGCA 
CCACAATGAG 
AGGGTGAGCA 
TTACAACTAC 
CCAGCAATAT 
GTACAAATGT 
CCATCAACAG 
CCTGTAATCC 
AGACCAGCCC 
CACGGGCGCC 
AGAGGCGGAG 
TGGAAACTCC 
AAGCAATAAA 
GAAGGAAAAT 
AAAACTAAGC 
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139321 TGTAATTATG GAAAGTACAT CAGTGGCTGC CTGGGGCCAA GAGGATGGAA GAGGCGGCAC 

1393B1 AGGTGATACT ACAAATGGAA ACTATCTAGG TTGACGGAAG TGTTCTGTAA CTTGATTACA 

139441 GTAGTAACTG TTTGGGTATA TAAAACGCAT CAAATTGTAT AATTAATACA GGTGTATTTT 

13 9501 ACTGTGTATA AATTATTCCT CAATAAAGTT GATTTTTCAT TAAATATATT ATTTGCTAAA 

13 9561 ATGAQGAGAG ACAACTATTA TCTTAAAATA GTTAAGCACA ATAAAAATAC TACAATCAAC 

13 9621 TCATTATATA TGGAAATTAA AGGAGAAAAA TAGTGGTATG ATTAATTAAA ATAAAAAGAA 

13 9681 AACCTTCTAA ATTTTATCTT AGCTCATAGT TGTAAAAGCT GCCATCCCTA ACCAAGGCCA 

139741 CCCTTGACCC TTTCTCATGT TCCATCTTTC TGTTTGTTTC ATAGTTTATG TCTCACCAAA 

13 9801 ATCTATCAGA TAAACGTATT CATATGAAGA TTTAAATATA TTACATGTTA AGCCTTAGCG 
139861 AATACTTCAA TATCTAAAGA AGGTACAAAC AAAACAAAAA TCAACACTTA GTTATAAGAG 
139921 ATTACATACT CTCCAGGGAA GACCTGAAGA CTAGCCCCTT TCTGGATCCC ACTAGCCCCT 
139981 CATCCCACTC CAAGCCCTCC CCTCCAATCC CATATGCACT GGGCATTCAT ACAAATAAGA 

14 0041 CCATCAGCTC TGGATATCTG TACTGATTGA TGCTCCTGCT AACTACCTGA ATGATTGCQA 
140101 TGTAAGQACA GCACTGCCTG AATCCTATTT ATCTCTCGCT ATGCCATAGC GGCCTTCCAT 
140161 GCTGATGGCG TGTTTGAGGA TCCAGAGGGG TCTTTGGTTG GCAGGATTGT TTTATTTCCC 
140221 CAAGAGGAGA GCCTTGATGC AAAAATAGGT GAAGAAATCA GTACAACAAA ACAGAAAGCC 
140281 TAGAAACTAC TATGAACACA ATAGAGCAGA AGTAGCCTTA AGAGTTGGTG GAGAAAGGAT 
140341 GGTCTATTCA ATTACCTGGG CTGAGAAACT GGCTTTCATA TGGAATAAAA ATAAAATTAT 
140401 AGCTATACCC CATATCATAC ACAAAAGTTT CTACATCTAA CAAAGACACA GATAGAAAAT 
140461 GTTTTAAAAT TTTAGAAGAA AATAGTGCAG AATTTTAGTG CAGAATTTCT TAGACTAGAT 
140521 GCAAAAACAA AAATGATTAA AGTGGCCAGG CACGGTGGCT TATGCCTGTA ATCTCAGCAC 
140581 TCTGGGAGGC CGAGGTAGGT GGATTAGTGG AGGTCATGAT TTCGAGACCA GCCTGGACAA 
140641 CATAGTGAAA CCCCATCTCT ACTAAAATAC AAAAATTGGT AGGGTGTGGT GGCTCACGCT 
140701 TTTAATCCCA GCTACTTGGG AGTCTGAGGC AGGAGAATCA CTTGAACCTG GGAGGCAGAG 
140761 GTTGCAGTGA GGGGAGATGG CGCCACTGCA CTCCAGCCTG AGCAACACAG CGAGACTCTG 
140821 TCTCAAAAAA ATCTAAAAAT AAAAAGATTA TTTTTAAAAG ACTATTTTAA ACAAAAAAAA 
140881 TCGTTTAAAT GATATGACAC ACTACATCTA ATATTTGGAA AAGTACTTCT TAATACTTTT 
140941 AATAAAAAGA GGCGCTGAGA GCATACAACC TATCCTCAGA AGAGTGTTTG ACCTCTAGGA 
141001 GGGACGCAAG CGCGTTCTTC CTTCATTTTA ACTGGTCATT TTCATTTATT TCAGGAACAT 
141061 CTGAAGTAAA CACAGTCACA CGTTAACCTT TAAAAATCTA GGAGGTGCGT ACGCATAGTT 
141121 CCATTACTTC AATnTTGTA CTTTTGCATT TTAAAATATC ACAGGGAAGC TCGGTACAGC 
141181 TTCAAGGCTA GGAGGGGTGG CTCTCTCTTA AGCCCTGTCC CCGCCAGCCC CAGACCTCTC 
141241 GTCCCGCCCC CATTGCCCAG TCCCCACCCT CACTTCCCCA TTTCCCCACT CCCGCGGTCT 
141301 CTTAACGCAC CTCGTTTTTC GTCCAGTGGA CTCAGACCTG TAGTCTTCCA CCAGGATCGG 
141361 CTCCTTTCCC GGAGCTCTCG CTCTTAGAGG AAATTGAGAG AAGCATCAGC GGAGACCCAT 
141421 CTGTGGCTCT CCAGAGGGCG CGGCATTCAG ACCCCAGATC CAGCTGTGAG AACGGACCCC 
141481 AGGCTCACAC CAGGCCTGCG GGAGGCGGCC CACCAGAGGC GCTAGAAAAC AAGCCTCGCG 
141541 GGGAGGCGCG CAGGGCGACT GCAAGCTGTA GGGGGCGCTG GCGCCCTCAC AGGCCAGGGG 
141601 CAGGGCCGGC GCTGCGGGCG GGGCTCCTGC GGCGTGAGGG GCGGCCCCAG GCCAGCAGCT 
141661 GCGCCCTGGC TGGGAGCCGG GGAGCATTTG CTGCTCTGCT GGACCCTGAG TCTGGCGGCG 
141721 GGCGGCCTCC TCTCCGCTCC CCGCCCGCCA TCCCCCAACT CCCGATCTCT CTGCTGCGTC 
141781 TGGCCTCAGG CTQAQACCCC AACGAATCAT TCCCCGCATG GGAACATTTT ATGATATAAC 
141841 T GAATT CAGT TTTATGTATA ACTGAATTAC GGATATGAGA ATCTCAAATG AGGACGAATG 
141901 GTTTTTACGC ACAAAACATG AGACACAAAT CTGTAAGAAA TATAAAGTCG TGACCACGTC 
141961 CTTTCAGAAC TTTAACCTGT TTGCTGAAGT ACGTCAGTAA CAATQGCAGG GAAAGGGTAT 
142021 CTTAAATTTC ACCACAGCCT CAAAGAGGCC ATTTCGTGGA TCCGCTGAGG CTTGGAGTCG 
142081 GCCTTCTGAC CACGAGTCCT GCGGCTATGA AAGAGGAAGC CGCGGTTCAG GGCGTCCtCG 
142141 CGAGTCGTGC AGCCCGCCCT GCTCCAGCTG GGGACACCGG TGGTCACGGC GCTTTCCAGC 
142201 TGCAGATCCA GGCGGCAGCC CAAGATTTGG TCCAGCCGCC AAGGGGTGGC TCGAGTGACT 
142261 GACGGGCCTT GAACGCTCCC AGGACCCACA TCTGGAGAGG GAGGTGGGGG TGGGGTGCTG 
142321 AAGTCATTCT TGGGGCCCCT GGGGGCGGGC ATGGACCTGG GTAAGGCCAG AGAAATTGAC 
142381 ACCTCGTGAC ATCCCTGGAA GAGAAGTACG TTCAGTGTCA CTCCAGAGCT GAAACCGCCT 
142441 TCTGGCTGGT CCCTCCTCAC CTACATACTT TTCTAATTTG TCTGGAGCAG GCCGGGCATC 
142501 TGTATTATCT GGTTATTTAA ATATCTGGTT ATTTAAAAGC TCTCCATTAA ATTCACATAC 
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142561 ACGAAAATAA AAATTAAAAA AAATTTTAAA AAAAAGAAAC AAAAGCTCTC TAATGACCAA 

142621 GTCCTACACG ATAGTGAATA AATTTTTTTG TGTGGTCCCT AAAATTGAGT TCATGCCTTT 

1426B1 TCTGAAGTAA TAGACGCCCA GAGAAGGGAT CGACTTACCC ATCATGCCAC AGAGATTAAT 

142741 TGGCCCCAGA ATTCTTTAGC AGACCGTGTA TATGAACGTC CTTTGCAATC ATATAAATTA 

142801 ACTGGGAAAA CCTCATTTAG TATGTTACAT GCCTAGCGTT TTQTGCCTGA ACACCTTACA 

142861 AGAACCAGGG ACTATTGCCC CAATATTATA TTTCAGGAAA GGAAGGCCCA GACAAATGGT 

142921 GTCACTGGTC CACTTTCACC CAGTTGGTAA ATGAAACCAG AAATTATAGC TGTACCACAG 

142981 AAAGGTGAAA ACGTTTCTTT TATAATTTCA CATACAATCT TTAATGGACC CAGTGTCCAA 

143041 CACATTAAAG CAAGTGCTCA GGAGTGACAT CAAGATGTAA AAAATAGTCC TGTCCTCAGG 

143101 GAGTTTAGGT CTTGGAGAAA AGAGACCCAA GGAGACACAA GACAAAGGGG AAAGAGAAGG 

143161 AGCGCTGAAG ACTGAGGACC CTGCCTGTGG ACTGAAGTGA GGATGGGGAC ACCCGATGCC 

143221 CGGAATATGA CAGTTTGGAG GGGCCTGAAG GACTCTTCTA TTCTCTATCA GAAAAACAGA 

1432 81 ATTACTCTCC TAACCAGAAA AGGTATTTCA ATTTATATTT TCCATCACAG CACTTTTCTG 

14 3341 GTGATAATTT AATGTGTTTT AAAAAATGTA TCACAGTGAT GGCCTGGTGT GAAATAAATA 

1434 01 ATAAAATTTT AAGAATTAAA AAATATAAAA ATCTTTTATA TAGACATTAG GAGTTACAAG 

143461 GATAACTGTG AATTATAATT AGTAATTAAA TTGAAATACT GATTATTTTC ATTTTTATTT 

143 521 AATTATTTAA TAAAACCTAT TTAACATTTA ATATTTATCA GTAATTAAAT CTAATTGTTA 

143 581 ATATTTATTA TTATAAATTA TTTTAGAATT AAAAATAAGT GTAGAAGCGA GGCATGGTGG 

143641 CTCAAGCCTG TAATCCCAAC ACTTTGGGAG GCTAAGGTGG GAGGATTGCT TGAGCCCAGT 

143 701 AGTTC/iAGAC CAGCCTGGGC AACATGGAGA AACCCTGTCT CAATACAAAA AAATGAGCCA 

143 761 TGTGTGGTGG TGCGTGCCTG TAGTCCCAGC CATTCTGGAG GCTGAGGTGG GAGGATGACT 

143 821 TGAGCCTAGG CAGTCAAGGC TGCAGTGAGC CCTGATCTTG CCACTGCACT CCAGTCTGGG 

143881 CAACAGAGCA AGACCCTGTG TCAATATACA TATGGACAAA CTTAAAATTT AAAATGAAAG 

143 941 CATACTACTG ATACAGAATT GAGTAGAGAT GCAAAGCTAG TCCTATAACC AGAACAATAA 
144001 AGATAAAAAG GAGAGTGGAA GAAGGTATGT CATGAATTTC ATGATAAATG GCAATTGCAA 
144061 ATATCCTGTA GCAGAACAAA ACAACAAAAC TGTAGATAAA ACATATCCAA CCCTTTGGAA 
144121 GGCCAAGGAG GGAGGATTGT TTGAGCCCAG AAGTTGGAGA CCAGCCTGGG CAACATAGTG 
144181 AGACCGTGTA TCTAAAAAGG AAGAAAGAAA AAAAAAAAAA GGATGATAAA GTAGACAATA 
144241 TTGAAAGCCA TTTTCTGCAA ATACATAGTG AATTTGATCA GTAATTTTCT TCCAACAGTG 
144301 CAAAAATGAA TAGATATTAG TTGCCTGAAA TAAAAATCAA ATATCCAACA AAAAATATTG 
144361 ACTATCTAAT AGTATCTAAG CTAGTAAATT TGGCCAGTTA TAAAATGTCT TAAATTTTTA 
144421 TTTAAAAAAA GAAAACCATA TTTATAAGAA GAGGTGATAA AGAGAAATTA TTTCAGTTAT 
1444 81 GAAGATTTTG TTAQAAAACT ATGAGAAAAA AACTATTTTT TGTTTTCAAA AAGTGAAAGA 
144541 TTAAGTTACC AAACAGTTGC TAAAGAATAC CAGATGGCTG AGCGTGGTGA CTTATGCCTG 

144 601 TAATCCCAGT ACTTTGGAAG GCCAAGGCAG GAGGATCATT TTAGGCCTGG AGTTCGAGAC 
144661 CAGCCTGGGC ACTGTAGCAA GACCCGTCTC TATTAAAAAA AAAAAAAAAA AAAAAAAAGA 
144 721 ATACAAGACC TTGCTAACAA TAGCAAAGAT CAATTAATTC AAAATTTGAA AAACTGTAAT 
144 781 TTATTTAGCT TTAGAGTACT CTCGTGATAT GAGATTGCCA AATTAATACT TTGGGTGCAT 
144841 TTCTTTTCTC AAAGGACTTG CAAATTTACA AAGAAGTGTT GAAGAAAAGC CACACATTGG 
144 901 CAGGTAATGT TTGCAAAAGA CAGATCTGAT GAAGAACAAT ATTTTTAGAA TATACAAAGA 
144 961 ATACTTAAAA CTCAACAGTA AGAAAATAAC CTGATTTAAA GCAGGCCAAT GACCTGAACA 
145021 TCTGTTCACC AAAGAAGATA CACAGATGCA AGTATGCATA TGAAAAGATG CTTGACATCA 
14 5081 TGTCATTAGG GAACTGCAAA TTAAAACAAG TAGATACCAC TGCATACCTA GTAGAATGAC 
145141 CAAAATTTAG AACACTGTCA GCACCAAAGG TTGCAAAGAT ATGTAGCAAT AGTAACTTGT 
14 5201 TCATTACTGG TGAGAATGCA AAATGTGCAA TCACTTTGGA AGACAGTTTG GTGGTTTCTT 
145261 ACAAAAGTAA CCATACTTTT ACCATAAGAT TCACCAATCA CACTCCTTAG TATTTATCCA 
14 5321 AAGGAATTGA AAACTTATCT CCACACAAAA ACCTGCACAT AGATGTTTAT AGCAGCTTTA 
14 5381 TTCATAATTT ATCCAAAACT TGGAAACAAG ATGTCTTTCA GTAGGTAAGT GGATAACTGT 
145441 GGTACTTCTG AATAATGGAA TGTTATTTAG AGTTAAAAAG AAATGCATTC ACTTTGGGAG 
145501 GCCGAAGTGG GTGGATTGCT TGAGGCCAGG AGTTTGAGAC CAGCCTGGTC AACATGGGAA 
145561 AACCCCAATT AGCCGGGCAT AGTGGCGTGA GCCTGTAATC CCAGCTACTC GGGAGGCTGA 
145621 QATATGAGAA TCGTTTGAAC CTGGGAGATG GAGGTTGCAG TGAGCCAGTG CCACTGCACT 
145681 TCAGCCTGGG CAACAGAGCA AGACTCCTCT GTCTCAAAAA AAAAAAAAAA AAGAAAGAAA 
145741 AGAAAAAAGA AAAAGAAAAA GAAAAGAAAC GATCAAGCCA TGAAAACACA TGAAGGAAAC 
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145801 TTAAATGTAT GTTACTAAAA AGCCAACCTG AAAAGACTGC ATACTATATG ACTCCAACTG 

14 5861 ATGCAGGGCA AGCAAGCCAA AAATTAGGGC TTAGCCCGGG AAGAATTCAA GGGTGAAGTG 

145921 GTGGTGTTAG CAACTTTTAC TGAAGCAGCA GTGTACAACA GCAGAACAGG TACTGCTCCT 

145981 TGCTGAGCAG GGCTAACCCA TAAGTAATGT GCCCAGAGTA GCAGCTCAGG GGCAGTTCTG 

146041 CAGTAATATA CCTGCTTTTA GTTAAGTGCA TGTTAAGGGG GATTATGCAG AAATTTCTAG 

146101 AAAAAGAGTG GTAACTTCGG AGTAGGTACA GAGGAAAGAA GTCGATAATG TCCTGTTGTT 

146161 GCCATGGCAA CGAAAAACTG ACATGGCGCT GGTGGGCGTG TCTTATGGAG AGGTGCTTTA 

146221 ACCTCGTCCC TGTTTCGGCT AGTCTTCAAT CTGGTCCGGA GTAAAGTCCC TGCCTCCGGA 

146281 GTTCACTCCT GCTTCCTGCT TCACAACTGT ATGACACTCT AGAAAAGACA GTAACTATGG 

146341 ACACAGTCAA AAGATTAGTT GATAGAAATT GGGTGACAGG AAGTGTTGAA AAGGCAGAAC 

146401 ACAGGATTTT TAGGGCAGTG AAACTTCTGT GATACTATAA TGGTGAATAC ATGACATTAT 

14 6461 ACATTTGTCA AAACCCATAG AAAGCACAAC ACCAAGAATA AACCCTAATG TAAATTACAG 

14 6521 ACTTTCGTTG ATAATGACGT GTCAATGTAA GTTCAATTGT AATAAATGTA CTACTGTGGT 

146581 GCTGGATGTC TATGGTGGGG GGACATTTTT GCTTCAATAG TTACAGTTGA AGTAAATGTT 

146641 TGTGTTTCCC ACAATGCATA TGTAGAAACT CTCACATTCA ATGTGATGGT CTTTGGAGGT 

146701 GGGCTCTTTG GGTGATAGTT AGGTTTAGTT GAGATCCTAG CAGATCGAGT CTTCATGATG 

146761 GGCATGATGG GACTGGTCCC TTATAAGAAA AGACCAGAAA GCTAGCTCTC TCTTTGCCAT 

146821 GTGAAGACAT AGCAGGAAGG TAGCCATCTG CAAGCTAGGA AAGGGCCTTC ACAAAGAATC 

146881 AACTCAGACC TCAGAACAGT GAGAGATAAA TTGTCGTTGT TTAAGTCACT CAGGCTGTGG 

14 6941 TATTTTGTTT CAGCAGCCCA ACCTAAGACT GTTAATTGGA TTAGAAATTT CCTTTTGGGG 

14 7001 ATGGTGTGTG GCGGGCGGGG GGCGGGGAGT ACCTTTGTTA AGCTTTTATA TCAATGAGTT 

14 7061 TGTAGGCTTT TCTTTTTTGG TCATTGACTA GGACAGTTTA AATAGTATGA GTGTGAAGGA 

14 7121 GATTGTTGGT CATCTATTCG ATGTCCCTTC TCTGTTTTTT AATATGAGAA CTCCTGATTT 

14 7181 TCAGCCAACT ACCCTGGAAA AAAAGCTAAT CTTTCTGACT TCTTAAGTGT GGCCATGTAC 

14 7241 TAAATTCTGG CTAATGCAAG GCAAGCCAAA GGTTTTATGA TAGGTTTTAG GACACTAGAG 

147301 TAAAAGAGAG CTGTTGCACA CATQCTCTTC ACCCTACTTT TGTGTCCTTT TTTCCATCCT 

147361 ACAACTTGGG TTGTGAGTAT GATGGCTGGA ACTTTAGTGG CTCTCTTGGA TCCCAGGGGT 

14 7421 AATTGAGGGG TGGCTGGAAG GAATCTGTGA TTTTCTGGAG TTTCCATACA CAAACAAGAC 

14 7481 CTGGATTTTC TGGGCTTCCC AGACTTCCAC ATCTAGACTT GCTTTAAATG GGAGATAAAT 

14 7541 AAACTTGTTT CAGCCACTGT CATTTTGGGC TATTTTATAG AACTTAATCT AATCTTCAAG 

147601 GGTACATGAA TTGCTTTTCC TTAAAAAAAA AATCAGCCAT AAAATCATCT TCTTTTTTCT 

147661 TTTGTTCCCC ACATTATTTA GTTGGAGCTC TGTAACTTTT TTTTTTTTTT TTTTTGAGAC 

147721 AAGGTCTTGC TCTGTCACTT AGGCTGGAAT TCAGTGGCAT GACCATGGCT CACTGCAGCC 

147781 TTGCCCTCCT AGGCTCAAGC AATCCTCGTC TCAGCCTCCT GAGTAGCTGA AACTAAGGCA 

147841 CATGCCACCA TGCCCAGCTA ATTTCTTTTC TTTTAGAGAT GGGAGCCTTG CCCAGGCTAG 

147901 TCTCAAACTC CTAGCCTCAA GTGATCCTCC CATCTCAGCC TCCCAAAGTG ACAGGATTAC 

147961 AGGTGTGAGC CACCATGCCT GGCTGCTCTG TAAGTGTCTG AATTTCATTT TGTATTTATC 

148021 AGTCTGTTTA GATTTTCTTT CCCTTCTTGG GTCAGTTAGG CCATTGGTTT CTTTTTAAAG 

148081 GTTTTCAAAT TTATTTGCAT CTAATTCTTC AAATTACTCT CAAAATTATT CCAGTATATA 

148141 TTCTTTTGTT CCTATTTTCT TCTGTATTCT TTATTAAAAT AGCTAATGAT TTATCTAGCA 

14 8201 GGACTTATAT TCTTTCCATA ACTTTCCTGC ACCCCAATTA ATCTCCAATT TTATATTTCT 

148261 TCTGGCCTTC CTTATAGTTT CCACAGGTTT ATTTTATTCA TTTTTTAAAA CTTTTATTTA 

148321 ATTGTTTATT TTATTATCAT TCTTTCTTAT TCAGCAATCT AAGTGCTTAG GGATATAQAA 

148381 TTTCCTCTAA GCAGCATATG CTAGGCTTTA ACAATGTTAG GGAGGCCTCC CCTTTCTGGG 

148441 GAAGACCACA CTTACATTAA CACAGGACTG TGGGATGCCA AGAGGTAGAG AAGAGCTTAT 

148501 GAATATCCAG ATTACATCTT CACTGATCCT GCACAAAGGT GGGGTTCCTC GGTTACCCAC 

148561 TGGGTCCTAT TACCCAAGTC TGGGTCAGCA TACCGAGACT ACGGGTATAT AGAACAAGTG 

148621 CAACTGGCGA TAATCCTTCT GTTGGGGAGA AAAATCTTTT TTTTCTATTC ATCTTAGGTT 

148681 CTCCATCTGT GGCCCTATCA AGTAGACTAA CAAAAGACAG ATTGACAAGA CAGAAACAAA 

148741 GCATGTGCAT TGTACAAACA CAGGGGAGTA CTGAGATGAA TACTCAAAAG AGGATTTAGA 

148801 ACTTGGGCTT ATATAGCATT TTAAGAAAAG AATACATTTT TTAAGTGACA AGGAAGACGA 

148861 AAAGGACTTT GAGTTTCTAG TGCAGTAAAT TGTGGGAAGG CAACTTTTTC TTTCCCTTTT 

148921 TTTTTT IT TT TTTTTAAAAA AAAAGACTTC TCTGGTGCTA TGTCCAGGCT GATAAGAGTC 

148981 TAAAGTCTCT GGTGACTAAC TTTTGTTCTT CCCCGAGTAA GAAGACACCT TCACAATTTC 
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149041 ATATCCTGCT TTTAGGCAAA TAGGGAGAGG GCAGAGGTGT TTGTTTGTTT TTAATCTATT 

149101 TTTTTTCTCA ATTGTCTTCA ACTCAAAATA CTTCTTATGC CAAAGATGGC ATATTCTGCT 

149161 ACCCTTCACT TACTACTTAC AACCCAGCCT CTATCATCAT AATTAGAACT TCTGACCCTG 

149221 GGGAACATGG GCAATAGTTT GAACTCTTTT ATATCTCCCT TAGGCAGAGA TGGAGGCCCA 

149281 GCCATGCCTC TGACATCTAG ACACAACTGT TGCTTCATTT CTCCTATTCT CAGAGGTGAT 

149341 GTTGTAGGAC TTCAACAAAT ATCAGTAAAC ATTAATTTTT TTTTTCCTTG AGGCACAGCA 

1494 01 TGATCTTGGC TTACTGCAGC TGCTGCAGGC TCAAGCAATT CTCCTGCCTT GGCCTCACGA 

149461 GTAGCTGGGT TACAGGCCCC TACCACCATG CCCGGCTAAT TTTTGTATTT TTAGTAGAGA 

149521 CAGGGTTTCA CCATGTTGGC CAGGCTGGTG TTGAACTCCT GACCTCAAGT GATCCACCTG 

149581 CCTCAGCCTC ACATAGTTCT GGGATTACAG GCGTGAGCCA CCATGCCTGG CCATCAATTT 

149641 TTATGTCAAC TCTAAATTAT AACATTTAGC AATTTTGTGA CTTTTTATGG TCATCATTAA 

149701 TGTTGTTTAT GTTTTAGTTG TAGTCCTGTC ATTACTCACT CGGGTATGGT AATTTGGTCT 

149761 TTTTCAAAAT GAAGTTAAGG TCTATTTGCT CTTCTCTGAA TCATAATAAG AACTGCCAAC 

149821 AGCCATTTCA GCAATAACTA TTTACTGAGA TTTTAAAATA TTTCAAGGTA ATTGGTCCTA 

149881 GCAGACTGGA AAATACCAAA TTCTTTTCCA GAACTGAATC CCCCATCAAA GTTCAATTTT 

149941 ACTCATAATT CCCTTTTCAT TTGAAGCATC TCATTGTAAG CCAGTCTTAA CCCTTCTCTC 

150001 ACACTTTGCT TGGCTGTTTC TCAGGTAGAA CTCAGTAAGT CTGGTAGCCT CCAGGACTGC 

150061 CGCTTAGATT ATTAAACAAC ATGTCAGTGG TTGGAAGAGT CAATGTTATT TTGATTTTTC 

150121 TGTTTTGTTT TGTTTTAAAT GCAGTTGGCG GATAATTGCA GCTTTCTTTC ATTCCCTACA 

150181 TGAGTTCAAA TGGCAGCAAA CAAACTAGGA GAACGCAGAC CTTCTGACTT GTGGGTACCC 

150241 CTACTCATCA CCTGAAGACC CTTGGAAATC AAAGCCCTGA CCCATTAAAG.^ACGGATGGAG 

150301 ACAGCAACAT ACGATCATCA CTATTATCTT GCTTTGCCCC AGTCCAGGTT AACCATCTGT 

150361 GGTATTTTTA GTTGCTAAGT CCATATATTC AACATAAATC AATTATATAT CCACTAAAAT 

150421 CTCAGCACTA GTCTAACTAC TAAGGAAATG ACAGCGAAGA AAACAGACCA AACGTCTGCC 

150481 CTTATGGGAT TTATATTATT TTCTCTGTGC TGGTTAAACC AAGGAGCTTC TGCTCTTTTC 

150541 CTTAGTCACC TGGGGGAGGC AGAAACAAAG GAGAATATTG ATAAACCTGG AAATAGGGCC 

150601 GGAGAGTATC AGAGAAGGAA GCCTTCGGGA AAGTAAAGAT GTGGCAGCCA GTATTCCCGT 

150661 TATAAAAGGA TACAACTCCG GCCTCATAGT CCAGAAAAAT TCCCACAAGC AGGGGCTGCT 

150721 CATGCAGATG AAGGGAAGTT GGGGGAGAAG TAAGTGCTAC ATAGCCTTTC TTTTTGCACA 

150781 GCCTGAGGGT CCAGAATCCA GACTGAGGCT CTTGCTTCAT GCCAGTGCCC CTCTGCACAT 

150841 TTTCCATACA AACTCCTAAA TCCCATCCGG TTCCTTCGCC AACATCCACT TCAAAGTAAC 

150901 GTCTTCCTGA GGTGAAGCCT TCACAACCCA AGACACAGGG GAAGGCAGTA AATCTCCTGG 

150961 AAGATGTGTC CTGATTCTCC TGGGTGTATC CACGAGTCAC TTGTCTCCGA TCCTCAGAGA 

151021 GAATTAGTTC GTGATGAGCT GTATCTGGAT CCAGAGTCAC ACTAACTGCA AAACAAAACA 

151081 AAACAAACAA AAATAATTTT GTTGCTGTGA AGAACACAGG TTATTTTATT TTATTTTATT 

151141 TTGAGATGGA GTGTTGCTGT CACCCAGGCT GGAGTGCACT GGCACTATCT CAACTCACTG 

1512 01 CAACCTCCAC CTCCTGGATT CAGGCAATTC TCCTGCCTCA GCCTCCGGAG TAACTGCGAC 

151261 TACAGGTGCG CACCACCACA AGTGGCTAAT TTTTTTAAAT TTTCTGTAGA GATGGGGTTT 

IS 13 21 CGCCATGTTG GCCAGGCTGG TCTCAAACTC CTGACCTGAA GTGTTCCACC CACCTCGGCC 

151381 TCCCAAAGTG CTGGATTACA CAGGTGTGAG CCACCATGCC CAGCCACAAG TTATTTTCAA 

151441 TAAAACCAGC CTGTGTTCAA ACCCAACTAT TGTTTCTTAT AAACTGGGTG AGCTTAGGCA 

151501 AATCATTTAA CTTTCTGAGC CTCAGTTTGT TAACTATAAA GTGGAAATTA CCGTATTTGT 

151561 TGCAGAGAAT GGTGGGTAGG ATTGAATAAG CTTATGTTTG CTTAATGCTT GGTAAAATTC 

151621 CTGGTACATG GTAACCACCT AATAAGTGGT AGTTGTTGGG GTGATCAGGC CCAACACCAG 

151681 GCCGTGGGGG CTACAAAGTC CGGCGGGGTC AAAGGAATGA GAAAAGACAA GTTAAGAGTG 

151741 CATAAAGTGG GTCCAGGGTG CCAGCACTAG ATTGGAGGCT GCAAAGGCCC TAAGCTCTGG 

151801 GAGCCCACAC TATTTATTGG TGATCAAACA AAGAAGCAGG TGGTGAQGAC GTGAGGGTAA 

151861 ACAGGTGAGG GCATGAGGAC ATGGGGGTAG AAAGGTAGTG GTGCATTAAG CGTAGCTGTG 

151921 ACAGTTTAGC ATTTTCTTTG ACACATGTAG AATATACTCT GCTGCTTGAG ATAGTAGAGG 

151981 ACACGTTTAT GAGTGAAAAG CAAGGAACCA ACAAGTCTGT GCACTTTCCA GAGGCTATGA 

152041 GGGGTTTTAT GCCCTGAGCC CTGGGTTCCA TCCAAGCCAC AAGGGGTTTT ATGCCCTAGG 

152101 CTTAGATTTG TGGTGCGGCA GGGCAGCCTT CCACCATTTG GCACAGAGCT TGGTGTTCCA 

152161 AAGGCCACGA GGGGTTTTGG ACCCTGGACC CCGGACATCT TCCAAGACTC TTTTACATTA 

152221 TGACAGACAA GCCAGTCCTG CTTCAGCTCT TCTAACAACA TGTAGTAATA ATGATATCAT 
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155521 GGTGGCAAAG GGAGACCCTG TCTCAAAAAA AAATTAAAAA ATTAGCCAGG TATGGTGGCC 

155581 TGTTCCTGTA GTCCCAGCAA CTGGGGAGGC TGAGGTGAGA AGATCACTTT AGCTCAGGTG 

155641 GTGGAGCCAT GATCGCACCA CTGTACCACT CGGCTTGGGC AACAGAGTGA GAGCCTGTCT 

155701 CGAAAAAACA AATATATACA CACAGTAATC AATATATATA TTATATGTAC CAATCAATGC 

155761 TTCACTTTTA TATATAATAT AGATTACATC TTATTAGATA TATAGTATTC CTTCTCCATA 

155821 GATAGATAGA TACAGATATA GACATAGTAT CCTCTATCCA TATTAGAGAG AGGATACTAT 

155B81 ATATATCTAT AGCATATAGA GATGCTGTCT CAAAAAAATT TAAACATCAG CCAGATGTGG 

155941 TGGCCCATGC CTGTAGTCCC AGCTACTGGG GAGGCTGAAA TGAGAGGATT GCCATTGATC 

156001 CTCTCATTGG TTGAGCCATA ATCGCACTAC TGCACCACTC AGCCTGGGAG ACAGAGGGAG 

156061 ACCTGAGGTG GAAGGATATA GATATAGATA TATAAATAAA TATGTATAGA GAGAATATAA 

156121 TATATGTGTG TATGTGTATA TATATATATT ATGAAGACAC TGGGAGAGAA TACTATATAT 

1561B1 ATATGTGTGT GTGTATATAT ATATTATGAA GACACTGGTG GGATGGTTTC ATTACCAATT 

156241 GGACCAAGAG TCCAGGTATG GAGCCAACAT GCAATGTTGT TGTTGACTGA GCTGGCAGAG 

156301 CACTGGTCAT AGTTACGGGA AAAGAAGGTC TCCAATGAGA CATACTTAAC AAAATATATG 

156361 AACTTGCCAT ATACGTGGAG AGTTCTGGTG TGTATATAGC CTTCTCTCAC CAACCTAGCA 

IS 64 21 ATTGTCTTCA TCATCATTAT AATGCTATCA GAGCAAAGAT GACAGCTAAA TTTTTTTGTC 

1564 81 CCTTTCTTCT TCTTTCTCTT CCTTCCCCTC CCCCACCTCT TTCTCTTCCT CCTCCTCCTT 

156541 CATCTCTCTT CTTTTTTTTT TTGAGATGGA GTCTTACTCT GTCGCTCAAG CTGGAGTGCA 

156601 GTGGCACAAT CTCAGCTCAC TGCAACCTCT GCCTTCTGGG TTCAAGCAAT TCTGCCTAAG 

156661 CCTCCAGAGT AGCTAGGACT GCAAGTGCAC ACCACCACAC CTGGCTAATT TirGTATTTT 

156721 TAGTAGAGAT AGGGTTTCAC AATGCTGGCC AGGCTGGTCT CAAACTCCTG CCCTCAAGTG 

156781 ATCCTCCTGC CTCGGCCTCC CAATGTGCTG GGATTACAGG CGTAAGCCAC TGTACCCGGC 

156841 CTCCTCCTTT AATAGACAGG GTCTAGCTCT GTTGCCCAGG CTGGGTACAG TGGCGTGATC 

156901 ATAGCTTACT GCAGCCTCGA ACTCCTGGGC TCAGGAGATC CTCCTGCCCT AGTCTCCCCA 

156961 GTAGCTGGAA CTACAGGCAT AGCACACGGG GCTAATAAAA TTAATTAGGT GATAAAATTC 

157021 ACTGCCCACT GATGACTAAG CTCTTTGGAC ATAAAAGACA CAGACCTTGA AGGAAAATGT 

157081 GTCTACTTAA TTTTGAAACC CTATTTATCA AAAAACAGGA TGAAAATGCA AAATGCCATC 

157141 CACATGCCAG AAGATATCAG CTATAATAAG TTCCCATAAA TCAATAAGGA AAAGAACCCA 

157201 ATAAAAATTA TTAAACCACA GTAAATCATG GGTAAATCAC AGAGGCCTGA AGGGCTAATG 

157261 GACATACAAA AAGAATCTCA ATCTCACTAG TGAAATCAGA AAAGCACAAA TTAAGTACAC 

157321 AATTAGGTAC CATTTTAAAT CTGTAAGACT GTCAAAATCA TAAATTATAT AAGTAAAGAC 

157381 TCAGGGAGTT TTGGAGGAGT GAGAGCTCTT ATATTGCTTG TGGGGTAGAA TTGGAACAAT 

157441 TTCAAGATCT GTAGTATCTG GTAAAATTAT GATATGCATC CCTCACACCA GCATGTCACT 

157501 CCAAGGTATC TCCCTGGAGG GAACATTTAC GGGACACAAG GAAGCATGGA TAAGAATGTT 

157561 CACAGTAGTA TTGTCTGCAA CAGCAACAAC AACAAAAAAA CCCAACTACA CACAACTTCA 

15 7621 ATGCCCAGTC CACAAGGCAA TGGATTAAAT AAACTTCAGG CCGGAGATGG TGGTTCATGC 

157681 CTGTAATCCC AACACTTTAG AAGGCCGAGG CGAGAGGACT GCTTGAGCCC AGGAGTTCAA 

157741 GACCAGCCTG AACAAAATAA AGAGATAGTG TTTCTACAAA AAATTTTTAA AAAATTAGCC 

157801 AGACGTGGCA GTGCTTGCCT GTGGTCCCAG CTACTGGGGA AGCTGACGTG GGAGGATTGC 

157861 TTAAGCCCAG GAATTTAAGG CTOCAGGGAG CCATGATGGG GCCATTGCAC TCCAGCCTGG 

157921 GTGACAGAGT GAGACCCTGT CTAAAAGAGA TAAGTAAATA ACAACTTTGC ATTTTCTGCC 

157981 ACATTGCAAA ATGGTGAGAG AGTGGTTTCT AGACTCTAGA CTCTTTCTAT GACTACCTTC 

158041 TAGTTATGAG ATCCTACAAC ACTCACCTAA CCTCTCTGTG TCATATTTCC TCCTCTATAA 

158101 AGCAAAAATG CCCCATATAG AGAGGACTGT GATATAAAAC AAGAACCAAG AAAAGTAAAG 

158161 CTTTTCTAAT CTGTCACAGA CTAAAGAGTG CTCAGTATAT GTGAGTCATT ATTCCTGGTG 

158221 CTGGTAGGAG TGTATGTTAC AACTTTGAGT CAAGTAATAT GGTACCATAT ATTAAGATTA 

158281 ACAACAACCT CGGCAATCCC AGTTTGGGGT ATGTTCCCAA AAGAAATGAA AGCACCAGGA 

158341 TATAAGGATG CATGGACTAG AAAGTTATTG TAGCAACATT GTAATAACTA AGTTCTAAAA 

158401 ACAGCCTGAA GCTCCATCAG TAGGGATATG GTTACATATA TTTATTATAT TCTTATGGAA 

158461 TATTAGACAT AAAAAGTAAC GAGTAACATA GAAGAGACAG TGTATATATG TTACGTTTGT 

158521 ACAAACTTAG GGAAAGATAT AGATCACCCT ACCTAGAGAA GTCAGATTGG AGACGGGTGG 

158581 GAAAAACCTT GAACTTTCTC CTTATATCCT TTATATTGTT TGACTGATTA AAATGTATTT 

158641 GTTGCATCTG CTTGAAGGCA ATGTAAAATA AAATAAACAT ACATTTAAAA ATAAAAATAA 

158701 AATTTATTCC TATCACTTTT GTAATAAAGC TGGQCACAGT GACTAACACT TGTAATCCTA 
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